Linux Server: to Reboot or Not to Reboot?
Linux servers have a reputation as workhorses. Since very early in the development of Linux, its users have boasted in the stability of the OS. In fact, it is not uncommon to hear of Linux-based servers running for years without the need for a reboot. This raises the question: how often should you reboot your Linux server?
Months and months of server uptime can be a good thing (and for some, even cause for boasting), but is it wise to go such a long time without rebooting? I would strongly argue that it is not. In fact, a wise server recovery/contingency plan will include reboots as part of a regular maintenance schedule. Below I outline some reasons why you should reboot your server on a regular basis.
Kernel Upgrades
The Linux kernel is under constant development. New drivers are always being written, old ones are rewritten, bugs are patched, and security holes are plugged. These upgrades generally result in a system that is faster, safer, and more reliable. Package managers upgrade the kernel regularly in most distributions. But even if your distribution doesn’t automatically upgrade your kernel, for the aforementioned reasons you should make it a point to do so periodically.
In order for the upgraded kernel to run, the system needs to be rebooted. Some distros notify the user when a reboot is required, but it is ultimately the responsibility of the sysadmin to know what software is being upgraded and what actions those upgrades require.
Real-World Reliability Testing
Any sysadmin who has been at it for a while has experienced this scenario:
Something happens that causes the server to shut down—perhaps a hardware addition/replacement, power loss, or the need to move the machine. Once the interruption is over, the admin boots the server only to find that things aren’t working as they should. Some critical service failed to start properly. What happened? As software packages are updated and new versions are released, many variables come into play that affect normal operation of that software. A configuration setting might become deprecated. A hack that was used to fix a bug in an old version, may render the new version useless. The list goes on.
As the time between reboots increases, so does the likelihood that some service will not initialize properly. These errors take time to diagnose and correct, which translates to unacceptable server downtime. This problem is compounded when two or three issues occur on a single reboot. Rebooting on a regular schedule allows the sysadmin to catch these types of errors quickly. It also provides time to correct the errors without workflow grinding to a halt, as users are informed ahead of time that the server will be down for maintenance.
While it is true that services can be restarted individually, nothing can accurately simulate a full reboot. And the longer you wait between reboots, the greater the chance of something going wrong. Remember: You will never experience a routine reboot until you implement a reboot routine.