The Linux Philosophy for SysAdmins, Tenet 07 — Automate Everything
Image by Opensource.com: CC-By-SA
Author’s note: This article is excerpted in part from chapter 9 of my book, The Linux Philosophy for SysAdmins, with some changes to update the information in it and to better fit this format.
What is the function of computers?
The right answer is, “to automate mundane tasks in order to allow us humans to concentrate on the tasks that the computers cannot – yet – do.” For SysAdmins, those of us who run and manage the computers most closely, we have direct access to the tools that can help us work more efficiently. We should use those tools to maximum benefit.
In this article we explore using automation to make our own lives as SysAdmins easier.
Why I automate everything
In my article, Be the Lazy SysAdmin, I state, “A SysAdmin is most productive when thinking – thinking about how to solve existing problems and about how to avoid future problems; thinking about how to monitor Linux computers in order to find clues that anticipate and foreshadow those future problems; thinking about how to make their job more efficient; thinking about how to automate all of those tasks that need to be performed whether every day or once a year.”
Unfortunately, most Pointy Haired Boses (PHBs) don’t realize that thinking is critical to the performance of our duties. I explore this problem in my article, How to be the lazy sysadmin.
SysAdmins are next most productive when creating the shell programs that automate the solutions that they have conceived while appearing to be unproductive — at least to the PHB.. The more automation we have in place the more time we have available to fix real problems when they occur and to contemplate how to automate even more than we already have.
I have learned that, for me at least, writing shell programs – also known as scripts – is the best strategy for leveraging my time. Once having written a shell program it can be rerun as many times as needed.
I can update my shell scripts as needed to compensate for changes from one release of Linux to the next. Other factors that might require making these changes are the installation of new hardware and software, changes in what I want or need to accomplish with the script, adding new functions, removing functions that are no longer needed, and fixing the not-so-rare bugs in my scripts. These kinds of changes are just part of the maintenance cycle for any type of code.
Every task performed via the keyboard in a terminal session by entering and executing shell commands can and should be automated. SysAdmins should automate everything we are asked to do or that we decide on our own needs to be done. Many times I have found that doing the automation up front saves time the first time.
One bash script can contain anywhere from a few commands to many thousands. In fact, I have written bash scripts that have only one or two commands in them. Another script I have written contains over 2,700 lines, more than half of which are comments.
How I got here
How did I get to the point of “automate everything?”
Have you ever performed a long and complex task at the command line thinking, “Glad that’s done — I never have to worry about it again.”? I have — very frequently. I ultimately figured out that almost everything that I ever need to do on a computer, whether mine or one that belongs to my employer or one of my consulting customers, will need to be done again sometime in the future.
My personal main reason for automating everything is that any task that must be performed once will certainly need to be done again. By collecting the commands required to perform the task into a file to use as a shell program, it becomes easy to run that exact same set of commands at a later time.
For me automation also means that I don’t have to remember or recreate the details of how I performed that task in order to do it again. It takes time to remember how to do things and time to type in all of the commands. This can become a significant time sink for tasks that require typing large numbers of long commands. Automating tasks by creating shell scripts reduces the typing necessary to perform my routine tasks.
Shell programs can also be an important aid to newer SysAdmins to enable them to keep things working while the senior SysAdmin is out on vacation or ill. Because shell programs are inherently open to view and change, they can be an important tool for less experienced SysAdmins to learn the details of how to perform these tasks when they need to be responsible for them.
Updates
I frequently install updates on all of my computers. In fact I have been doing updates this morning. This is a task that requires only a couple decisions and can be easily automated. “But that is so simple, why automate a task that requires only a command or two?” It turns out that updates are not so simple. Let’s think about this for a minute.
First I must determine whether any updates are available. Then I need to determine whether a package that requires a reboot is being updated, such as the kernel or glibc. At this point I can install the update. Before I do a reboot, assuming one is required, I run the mandb utility to update the man pages; if this is not done, new and replacement man pages won’t be accessible and old ones that have been removed will appear to be there even though they are not. Finally, if a reboot is needed, I do that. A reboot is usually a good idea after updates to the kernel, glibc, or systemd.
That is a non-trivial set of individual tasks and commands that require some decisions. Doing those tasks manually requires paying attention and some intervention to enter new commands when the previous ones complete. Because of the need to babysit while waiting to enter the next command, this would take a great deal of my time to monitor each computer as it went through the procedures. There was room for error as I was reminded occasionally when I would enter the wrong command on a host.
And I usually need to do all that on the many computers in my lab and at least a few VMs.
Using the statement of requirements I created above, because that is what that paragraph really is, it was easy to automate this to eliminate all of those issues. I wrote a little script that I call doUpdates.sh. It is a little over 400 lines in length and provides options like help, verbose mode, printing the current version number, and an option to reboot only if the kernel or glibc had been updated.
I use the .sh extension to the filename to indicate that the file is a shell script. That makes shell scripts easy to find using simple Linux tools.
Over half of the lines in this program are comments so I can remember how the program works the next time I need to work on it to fix a bug or add a little more function. Much of the basic function is copied from a template file that maintains all of the standard components that I use in every script I write. Because the framework for new scripts is always there, it is easy to start new ones.
You can download the doUdates.sh program from the Downloads page. You can follow the next few paragraphs more easily if you do that now so you can view the code.
The doUpdates.sh script should be located in /usr/local/bin in accordance with The Linux Filesystem Hierarchical Standard. It can be run with the command doUpdates.sh -ur in which the r oprion will cause it to reboot the host only if any of the conditions for that is met.
I won’t deconstruct the entire doUpdates.sh program for you but there are some things to which I want to call your attention. First notice the number of comments; these are to help me remember what each section is supposed to do. The first lines of the program after the she-bang (#!/bin/bash) contain the name of the program, a short description of its function, and a maintenance or change history. This first section is based on some practices I was taught and followed while I worked at IBM. Other comments delineate the various procedures and major sections and provide a short description of each. Finally, shorter comments embedded in the code describe function or objective of shorter bits of code such as flow control structures.
I have a large number of procedures at the beginning of the script. This is where they go for bash. These procedures are from my template script and I use them whenever possible in new scripts to save the effort of rewriting them.
The procedure and variable names are meaningful and some use uppercase for one or two characters. This makes easier reading and helps the programmer (me) and any future maintainers (also me) understand the functions of the procedures and variables. Yes, this does seem to be contrary to one of the other tenets of the Philosophy but making the code more readable saves far more time in the long run. I know this from several past experiences with code of my own and that of others.
One organization I did some consulting work for started me with the task of fixing some bugs in a number of scripts. I took one look at the scripts and knew it would take a lot of work to fix the actual bugs because I first had to fix the readability of the scripts. I started by adding comments to the scripts because there were none. I then started renaming variables and procedures so that it was easier to understand the purpose of those variables and the nature of the data they held. It was only after making those changes that I could begin to understand the nature of the bugs they were experiencing.
Additional levels of automation
Now I have this incredibly wonderful and useful script. I have copied it to /usr/local/bin on all of my computers. All I have to do now is run it at appropriate times on each of my Linux hosts to do the updates. I can do this by using SSH to login to each host and run the program.
But wait! There’s more! Have I told you yet how absolutely cool SSH is?
The ssh command is a secure terminal emulator allows one to login to a remote computer to access a remote shell session and run commands. So I can login to a remote computer and run the doUpdates command on the remote computer. The results are displayed in the ssh terminal emulator window on my local host. The Standard Output (STDOUT) from the command is displayed on my terminal window. The -u means do the updates and -r means reboot if the required conditions are met.
# ssh hostname doUpdates.sh -ur
Figure 1: This command runs the doUpdates.sh program on a remote host using Public/Private KeyPairs for authentication. The -u means do the updates and -r means reboot if the required conditions are met.
That part is trivial but the next step is a bit more interesting. Rather than maintain a terminal session on the remote computer I can simply use a command on my local computer such as that in Figure 9-3 to run the same command on the remote computer with the results being displayed on the local host. This assumes that SSH public/private keypairs2 (PPKP) are in use and I do not have to enter a password each time I issue a command to the remote host.
So now I run a single command command on my local host that sends a command through the SSH tunnel to the remote host. OK, that is good, but what does it mean?
It means that what I can do for a single computer I can also do for several – or several hundred. The bash command line program in Figure 2 illustrates the power I now have.
# for I in host1 host2 host3 ; do ssh $I doUpdates.sh -r ; done
Figure 2: This bash command line program runs the doUpdates.sh program on three remote hosts.
Think we’re done? No, we are not! The next step is to create a short bash script of this CLI program so we don’t have to retype it every time we want to install updates on our hosts. This does not have to be fancy; the script can be as simple as the one in Figure 3.
#!/bin/bash
for I in localhost host1 host2 host3 ; do ssh $I doUpdates.sh -r ; done
Figure 3: This bash script contains the command line program that runs the doUpdates.sh program on the local host and three remote hosts.
This script could be named “updates” or something else depending on how you like to name scripts and what you see as its ultimate function. I think we should call this script, “doit”. Now we can just type a single command and run a smart update program on as many hosts as we have in the list of the for statement. Our script should be located in the /usr/local/bin directory so it can be easily run from the command line.
Our little doit script looks like it could be the basis for more general application. We could add more code to doit that would enable it to take arguments or options such as the name of a command to run on all of the hosts in the list. This enables us to run any command we want on a list of hosts and our command to install updates might be doit doUpdates -r or doit myprogram to run “myprogram” on each host.
The next step might be to take the list of hosts out of the program itself and place them in a doit.conf file locate in /usr/local/etc – again in compliance with the Linux FHS. That command would look like Figure 4 for out simple doit script. Notice the back tics (`) that create the list used by the for structure from the results of the cat command.
#!/bin/bash
for I in `cat /usr/local/etc/doit.conf` ; do ssh $I doUpdates.sh ; done
Figure 4: We have now added a simple external list that contains the host names on which the script will run the specified command.
By keeping the list of hosts separate, we can allow non-root users to modify the list of hosts while protecting the program itself against modification. It would also be easy to add an -f option to the doit program so that the users could specify the name of a file containing their own list of hosts on which to run the specified program.
Finally, we might want to set this up as a cron job so that we don’t have to remember to run it on whatever schedule we want. Setting up cron jobs is worthy of its own section in this chapter so that is coming up next.
Using cron for timely automation
There are many tasks that need to be performed off-hours when no one is expected to be using the computer or, even more importantly, on a regular basis at specific times. I don’t want to have to get up at oh-dark-hundred to start a backup or major update, so I use the cron service to schedule tasks on a repetitive basis, such as daily, weekly, or monthly. Let’s look at the cron service and how to use it.
I use the cron service to schedule obvious things like regular backups that occur every day at 2:00AM. I also do a couple less obvious things. All of my many computers have their system times, that is the operating system time, set using NTP – the Network Time Protocol. NTP sets the system time; it does not set the hardware time which can drift and become inaccurate. I use cron to set the hardware time using the system time. I also have a bash program I run early every morning that creates a new “message of the day” (MOTD) on each computer that contains information such as disk usage that should be current in order to be useful. Many system processes use cron to schedule tasks as well. Services like logwatch, logrotate, and rkhunter, all use the cron service to run programs every day.
The crond daemon is the background service that enables cron functionality.
The cron service checks for files in the /var/spool/cron and /etc/cron.d directories, and the /etc/anacrontab file. The contents of these files define cron jobs that are to be run at various intervals. The individual user cron files are located in /var/spool/cron, and system services and applications generally add cron job files in the /etc/cron.d directory. The /etc/anacrontab is a special case that will be covered a bit further on.
crontab
Each user, including root, can have a cron file. By default no file exists, but using the crontab -e command to edit a cron file creates them in the /var/spool/cron directory. I strongly recommend that you not use a standard editor such as vi, vim, emacs, nano, or any of the many other editors that are available. Using the crontab command not only allows you to edit the command, it also restarts the crond daemon when you save and exit from the editor. The crontab command in Figure 5 uses vi as its underlying editor because vi is always present on even the most basic of installations.
# crontab -e
Figure 5: This command launches the vi editor in a special mode that restarts the cron service when quitting.
All cron files are empty the first time you edit it so you must create it from scratch. I added the job definition example in Figure 6 to my own cron files just as a quick reference. Feel free to copy it for your own use.
SHELL=/bin/bash
MAILTO=root@example.com
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
# For details see man 4 crontabs
# Example of job definition:
# .---------------- minute (0 - 59)
# | .------------- hour (0 - 23)
# | | .---------- day of month (1 - 31)
# | | | .------- month (1 - 12) OR jan,feb,mar,apr ...
# | | | | .---- day of week (0-6)(Sunday=0 or 7)(sun,mon,tue,wed,thu,fri,sat)
# | | | | |
# * * * * * user-name command to be executed
# backup using the rsbu program to the internal HDD then the external USB HDD
01 01 * * * /usr/local/bin/rsbu -vbd1 ; /usr/local/bin/rsbu -vbd2
# Set the hardware clock to keep it in sync with the more accurate system clock
03 05 * * * /sbin/hwclock --systohc
# Perform monthly updates on the first of the month
25 04 1 * * /usr/local/bin/doit
Figure 6: The crontab file on my primary workstation runs three different programs, each on its own schedule.
The first three lines of the crontab file set up a default environment. Setting the environment to that necessary for a given user is required because cron does not provide an environment of any kind. The SHELL variable specifies the shell to use when commands are executed. In this case it specifies the bash shell. The MAILTO variable sets the email address to which cron job results will be sent. These emails can provide the status of backups, updates, or whatever, and consist of the output from the programs that you would see if you ran them manually from the command line. The last of these three lines sets up the PATH for this environment. Regardless of the path set here, however, I always like to prepend the fully qualified path to each executable.
There are several comment lines that detail the syntax required to define a cron job. I think that they are mostly self-explanatory so I will use those entries as examples, then add a few more that will show you some of the more advanced capabilities of crontab files.
The line shown in Figure 7 runs another of my bash shell scripts, rsbu, to perform backups of all my systems. This job is kicked off at 1 minute after 1AM every day. The splat/star/asterisks (*) in positions 3, 4, and 5 of the time specification are like file globs for those time divisions; they match every day of the month, every month, and every day of the week. This line runs my backups twice; once to backup onto an internal dedicated backup hard drive, and once to backup onto an external USB hard drive that I can take to the safe deposit box.
01 01 * * * /usr/local/bin/rsbu -vbd1 ; /usr/local/bin/rsbu -vbd2
Figure 7: This line in /etc/crontab runs a script that performs daily backups for my systems.
The line shown in Figure 8 sets the hardware clock on the computer using the system clock as the source of an accurate time. This line is set to run at 3 minutes after 5AM every day.
03 05 * * * /sbin/hwclock --systohc
Figure 8: This line sets the hardware clock using the system time as the source.
The last cron job, shown in Figure 9, is the one we are especially interested in. It is used to perform our updates at 04:25AM on the first day of each month. This assumes we are using the very simple doit program from Figure 9-5. The cron service has no option for “The last day of the month,” so we use the first day of the following month.
25 04 1 * * /usr/local/bin/doit
Figure 9: The cron job for running the doit command which in turn runs doUpdates.
So now all of the hosts in our network get updated each month with no intervention at all from us. This is the ultimate in being the Lazy SysAdmin.
Of course we still need to verify that the backups are being made and that they contain the files and filesystems that we intended to back up. And we should also test performing a restore of the backed up data on a regular basis. There’s nothing worse than telling the boss that we can restore lost files from a backup only to find that the backup is corrupted.
Other cron options
There are some other options provided by the cron service that we can also use to run out doit program on a regular basis. The directory /etc/cron.d is where some applications install cron files when there are no users under which the programs would run, these programs need a place to locate cron files so they are placed in /etc/cron.d. These cron files have the same format as a user cron file.
Scheduling tips
Some of the times I have set in the crontab files for my various systems seem rather random and to some extent they are. Trying to schedule cron jobs can be challenging especially as the number of jobs increases. I usually only have a couple tasks to schedule on each of my own computers so it is a bit easier than some of the production and lab environments I have worked.
One system for which I was the SysAdmin usually had around a dozen cron jobs that needed to run every night and an additional three or four that had to run on weekends or the first of the month. That was a challenge because if too many jobs ran at the same time, especially the backups and compiles, the system would run out of RAM and then nearly fill the swap file which resulted in system thrashing while performance tanked so that nothing got done. We added more memory and were able to do a better job of scheduling tasks. Adjusting the task list included removing one of the tasks which was very poorly written and which used large amounts of memory.
anacron
The crond service assumes that the host computer runs all the time. What that means is that if the computer is turned off for a period of time and cron jobs were scheduled for that time, they will be ignored and will not run until the next time they are scheduled. This might cause problems if the cron jobs that did not run were critical. So there is another option for running jobs at regular intervals when the computer is not expected to be on all the time.
The anacron program performs the same function as crond but it adds the ability to run jobs that were skipped if the computer was off or otherwise unable to run the job for one or more cycles. This is very useful for laptops and other computers that get turned off or put in sleep mode.
As soon as the computer is turned on and booted, anacron checks to see whether configured jobs have missed their last scheduled run. If they have, those jobs are run immediately, but only once no matter how many cycles have been missed. For example, if a weekly job was not run for three weeks because the system was shut down while you were away on vacation, it would be run soon after you turn the computer on, but it would be run once not three times.
The anacron program provides some easy options for running regularly scheduled tasks. Just install your scripts in the /etc/cron.[hourly|daily|weekly|monthly] directories, depending on how frequently they need to be run.
How does this work? The sequence is simpler than it first appears.
- The crond service runs the cron job specified in /etc/cron.d/0hourly.
- The cron job specified in /etc/cron.d/0hourly runs the run-parts program once per hour.
- The run-parts program runs all of the scripts located in the /etc/cron.hourly directory.
- The /etc/cron.hourly directory contains the 0anacron script which runs the anacron program using the /etdc/anacrontab configuration file shown in Figure 6.
- The anacron program runs the programs located in /etc/cron.daily once per day; it runs the jobs located in /etc/cron.weekly once per week, and the jobs in cron.monthly once per month. Note the specified delay times in each line that helps prevent these jobs from overlapping themselves and other cron jobs.
Instead of placing complete bash programs in the cron.X directories, I install them in the /usr/local/bin directory which allows me to run them easily from the command line. Then I add a symlink in the appropriate cron directory, such as /etc/cron.daily.
The anacron program is not designed to run programs at specific times. Rather, it is intended to run programs at intervals that begin at the specified times such as 3AM (see the START_HOURS_RANGE) of each day, on Sunday to begin the week, and the first day of the month. If any one or more cycles are missed, then anacron will run the missed jobs one time as soon as possible.
Thoughts about cron
I use most of these methods for scheduling tasks to run on my computers. All of those tasks are ones that need to run with root privileges. I have seen only a few times when users had a real need for any type of cron job, one of those being for a developer to kick off a daily compile in a development lab.
It is important to restrict access to cron functions by non-root users. However there are circumstances when it may be necessary for a user to set tasks to run at pre-specified times and cron can allow users to do that when necessary. SysAdmins realize that many users do not understand how to properly configure these tasks using cron and the users make mistakes in the configuration. Those mistakes may be harmless but they can cause problems for themselves and other users. By setting procedural policies that cause users to interact with the SysAdmin those individual cron jobs are much less likely to interfere with other users and other system functions.
It is possible to set limits on the total resources that can be allocated to individual users or groups, but that is an article for another time.
cron Resources
The man pages for cron, crontab, anacron, anacrontab, and run-parts all have excellent information and descriptions of how the cron system works.
Other automation possibilities
I have automated many other tasks that I need to perform on the Linux computers for which I am responsible. The short list below is certainly not all-inclusive, but is just intended to give you ideas for some places to start.
- Backups
- Upgrades (dnf-upgrade)
- Distributing updates to local shell scripts to a list of hosts.
- Finding and deleting very old files (cruft).
- Creating a daily message of the day (/etc/motd).
- Checking for viruses, rootkits, and other malware.
- Change/add/delete mailing list subscriber email addresses.
- Regular checks of the host’s health such as temperatures, disk usage, RAM usage, CPU usage, and etc.
- Anything else repetitive.
Deepening the philosophy
Automation of the SysAdmin’s own work is a large part of that work. Because of this, many tenets of the Linux Philosophy for SysAdmins are related to the tasks and tools that support automation using shell scripts and ad hoc command line programming.
Computers are designed to automate various mundane tasks and why should that not also be applied to the SysAdmin’s work? We lazy SysAdmins use the capabilities of the computers on which we work to make our jobs easier. Automating everything that we possibly can means that the time we free up by creating that automation can now be used to respond to some real or perceived emergency by others, especially by the PHB. It can also provide us with time to automate even more.
Automation is not merely about creating a program to perform every task. It can be about making those programs flexible so that they can be used in multiple ways such as the ability to be called from other scripts and to be called as a cron job.
My programs almost always use options to provide flexibility. The doit program used in this chapter could easily be expanded to be more general than it is while still remaining quite simple. It could still do one thing well if its objective were to run a specified program on a list of hosts.
My shell scripts did not just spring into existence with hundreds or thousands of lines. In most cases they start as a single ad hoc command line program. I create a shell script from the ad hoc program. Then another command line program is added to the short script. Then another. As the short script becomes longer I add comments, options, and a help feature.
Then, sometimes, it makes sense to make a script more general so that it can handle more cases. In this way the doit script becomes capable of “doing it” for more than just a single program that does updates.
But if we did that, our doit script might become Ansible.