Server configuration rollout

12 minute read

In the previous article, I brought my configuration files under version control. Now I want to automatically install the updates provided on the server.

What is this about?

  • Writing a script to automatically detect the update trigger
  • Server applications should be shut down and a backup copy created
  • The script should distribute the configuration on the system
  • All applications should then be restarted

Ways and possibilities

Here, too, I spent several hours researching. For larger projects, infrastructure experts seem to use specialized automation tools. These include Ansible, simpler tools such as cdist or even Kubernetes for highly scalable services.

Automation services Ansible, cdist, Kubernetes

Ansible and cdist seem to follow a similar concept: On the source machine (in this example my laptop) I create the configuration for my server and store this and the instructions for configuring my services in a “playbook” (Ansible) or in “types” (cdist).

To put it simply - as I understand it - the tool then takes care of building the configuration at the push of a button, dialing into the target host via ssh, pushing it over there and then starting it. For Ansible, there are even tutorials like this one for my scenario with docker-compose, which makes getting started even easier.

Kubernetes takes a different approach. It sees itself as more of a container manager, load balancer and scaling agent, but can do similar things for my purposes.

My approach

I, on the other hand, only need a fraction of the capabilities of these programs. I am also put off by the “additional” ssh channel, the configuration effort, the additional programs sometimes required on the target system, and the necessary reading and selection of the best tool for me. Because thanks to my very simple pipeline from the last article, the configuration is already on my server. It “only” needs to be copied to the right places and the affected services restarted.

Therefore, I am trying to solve this problem using on-board tools, my brain in working order and a few searches in relevant forums on the topics “executing a script when a file changes and “copying folder structures in Linux”.

Risk of circular reference

However, I am taking a risk: Since the deployment runs via Gitea, but the configuration affects Gitea itself, if there is an error in this module I can no longer change or reset anything: The Gitea service is then broken. I would have to get the configuration up and running again manually on the server.

I’ll try it out anyway and see if I am actually confronted with such a problem. If so, I’ll just switch to cdist and document it in a separate article! 🤗

Preparation: Create folder system and scripts

I think about it for a moment and create a few folders in the server file system:

# /opt/server-config
mkdir automation-hooks-trigger
mkdir automation-hooks-handler

These folders should represent the two sides of the automation. trigger contains text files that can be manipulated from the (web) service side. For example, act_runner should write the file server-config-update as soon as an update is available.

service –> writes to trigger file     server_handler() –> trigger_file.changed ? run_action() : loop()

The handler folder contains script files that make the necessary changes to the server file system.

Implementation: Shell script to handle the update

In the previous article, I executed the command touch server-config-update.txt in the act_runner container to signal the presence of a new server configuration. On the server, I now use the following code to periodically check whether this file has changed.

### set directories
actionfile=/opt/server-config/automation-hooks-handler/server-config-action.sh
triggerfile=/opt/server-config/automation-hooks-trigger/server-config-update.txt

### Set initial time of file
LTIME=`stat -c %Z ${triggerfile}`

while true
do
   ATIME=`stat -c %Z ${triggerfile}`

   if [[ "$ATIME" != "$LTIME" ]]
   then
       ${actionfile} 2>&1
       LTIME=$ATIME
   fi
   sleep 10
done

The operating system is given the interpreter with which the script is to be executed via #!/bin/bash. The command stat -c %Z queries when the file was last changed and returns the time in Epoch format. Finally, the time taken at the beginning of the script is compared with the time taken in the loop and if there is a change, the deploy routine that is still to be written can then run. The reference time is then updated. Finally, the script (sleep 10) pauses for ten seconds before the query starts again. I use absolute paths so I’m able to both start it from console and per service cron oder systemd.

First test and hooking into “autostart”

Now we need to make the file executable for a first test:

# change file mode bits: add "executable" flag to server-config-handler script
chmod +x /opt/server-config/automation-hooks-handler/server-config-handler.sh

Next, I create the server-config-action.sh file and add just an echo command. To test the setup I create my update trigger file locally and start the script. I then modify the triggerfile in another shell using touch server-config-update.txt and --- CONFIG UPDATE TRIGGER detected --- appears in the console. Great!

Later on the server, I need to have the script run automatically after a restart. To do this, I use the cron tool help and create a new entry using crontab -e:

# cron can automatically execute recurring tasks
# in this case, we're running server-config-handler script on reboot
@reboot sh /opt/server-config/automation-hooks-handler/server-config-handler.sh

With the command ps aux I can now check whether the script is actually being executed:

schallbert@server: ps aux
[...]
8:15   0:00 /bin/bash ./server-config-handler.sh
8:15   0:00 sleep 10
[...]

Backup

Making a backup of my applications and files before I roll out the update makes total sense. So I tell Borg that I want to create a backup now. Of course, I have to stop all services first. This means that all data is accessible, coherent and static.

Freeze state and data

To do this, I create a script in automation-hooks-handler that automatically terminates all containers except borg.

#! /bin/bash
# /opt/automation-hooks-handler/backup-pre-action.sh
# this shell script shuts down all docker containers prior to backup

echo "Shutting down containers for backup:"

echo "watchtower..."
cd /opt/watchtower
docker compose down 2>&1
# [...]

The expression 2>&1 means that any error output is redirected to the console. The number 1 represents the file descriptor for stdout, while 2 means stderr. The operator >& functions as a redirect merger. Later, we can go to this point and write the output to a log file - but I’ll leave that out for now for the sake of simplicity.

Borgmatic: Trigger handler mechanism #2 and #3

I can now run the script in two ways:

  1. As a call by the server-config-handler script described above
  2. By the automation solution borgmatic placed in front of borg

I choose the second option and therefore write the following commands in borgmatic.d/config.yml:

# borgmatic.d/config.yml
# List of one or more shell commands or scripts to execute before
# creating a backup, run once per repository.
before_backup:
    - echo "Triggering container shutdown for backup."
    - touch /etc/automation-hooks-trigger/backup-pre.txt
    - sleep 20
    - echo "Assuming container shutdown complete. Creating the backup now."
# [...]
after_backup:
     - echo "Triggering container restart after backup."
     - touch /etc/automation-hooks-trigger/backup-post.txt
     - sleep 10
     - echo "Assuming container restart complete. Exiting."
# [...]

For this to work properly, I have to create a volume in the associated docker-compose.yml and have it point to the path in the server’s file system: ${VOLUME_UPDATE_TRIGGER}:/etc/automation-hooks-trigger Now I set up the other side of these triggers: Handlers monitor trigger files for changes and call action scripts accordingly. These look very identical to server-config-handler.sh except actionfile and triggerfile paths.

Creating the backup

If I were to address borg directly, the backup could be created using create. To do this, I would have to specify in which repository the backup copy should be saved and under which name. In the example below, this is specified using the scope operator: ::config-update. The folders to be backed up are then specified.

borg create /path/to/repo::config-update ~/opt

I use borgmatic, which takes a lot of work off my hands using the configuration file. However, I have to execute the command in the container. To better check whether everything is working, I output statistics “verbose” to the console (--stats -v 1) and display the copied files --files.

docker exec borgmatic sh -c "cd && borgmatic --stats -v 1 --files 2>&1"

I add this line to the automation script.

Second test to create the backup

If everything works now, the complete process looks like this:

  1. After the changed files have been received by Giteas Automation, act_runner (in the Docker container) stores the files on the server and then sets the server-config-update trigger.
  2. Within 10sec the trigger is recognized by server-config-handler.sh, which then calls server-config-action.sh.
  3. borgmatic (in the Docker container) is instructed to create a backup. This in turn writes the pre-backup trigger.
  4. Again within 10sec this pre-backup-handler.sh calls the backup-pre-action.sh script and stops all containers except borgmatic.
  5. Due to the built-in delay, borgmatic waits for this and then creates the backup.
  6. After the backup, borgmatic writes the backup-post-action.sh trigger.
  7. Within another 10 seconds, post-backup-handler.sh recognizes the changed file and restarts all containers via post-backup-actions.sh.
  8. borgmatic tells server-config-update.sh whether any errors have occurred anywhere in the process so far. If not, it continues.
  9. All Docker containers are stopped.
  10. The server configuration is rolled out to the appropriate locations.
  11. All Docker containers are restarted with the new configuration.

A first success: These scripts are already running on my laptop up to point 6:

schallbert@laptop:               touch server-config-update.txt
server-config-handler@laptop: --- RUN backup ---
borgmatic@docker:                /etc/borgmatic.d/config.yml: Running 4 commands for pre-backup hook
                                 Triggering container shutdown for backup.             
backup-pre-handler.sh@laptop:    --- RUN backup-pre-action.sh ---
                                 Shutting down containers for backup:
                                 watchtower...
                                 [...]
                                 complete.
borgmatic@docker:                Assuming container shutdown complete. Creating the backup now.
                                 local: Creating archive
                                 Failed to create/acquire the lock /mnt/repository/lock.exclusive

Troubleshooting for “failed to acquire the lock”

This problem occurs for me when borg reports an error when creating a backup that causes the program to abort. In this case, the repository is apparently not released correctly, so that after restarting the container it remains reserved for the old, now non-existent container. The following command solves this problem:

docker exec borgmatic sh -c "cd && borg break-lock /mnt/repository

Transferring the configuration

Now the configuration files have to be copied to the correct location on the server. Fortunately, I had already cloned the target folder structure when creating the repository, so I should be able to do this with a single copy command without “hardcoding”. After a bit of online research and a look at the user manual for the copy command man cp, I have my command:

# copy recursively contents of folder "server-config" to "/opt"
sudo cp -r -v /opt/server-config/. /opt 2>&1

This tells the operating system to copy the contents (/.) of the folder server-config recursively (-r), including all subfolders into the folder opt, which is located in the root directory /. cp works in an overwrite-supplement manner, so it will create files that do not yet exist and overwrite existing ones, and will not “copy them next to each other” under the same name. With the -v option I can output additional details, and with 2>&1 I redirect the error output to the console.

All folder indicators must be exactly where they are in the command: A slash after /opt/ would copy folders redundantly without overwriting, but would overwrite files. Without . the folder server-config would be created in the target path.

Switching to rsync

Unfortunately the cp command also copies a few files that I don’t want copied: repository-specific folders such as .gitea, or the folders for triggers and handlers. I only need these under server-config, not directly in opt. To fix this, I use the rsync command instead. There I can use a -u option so new files owerwrite older ones only and add --exclude to exclude files and folders that should not be copied. This looks like so:

rsync -r -u -v --exclude '.*' --exclude 'README.md' --exclude '<otherFolders>' /opt/server-config/. /opt 2>&1

But suddenly gitea no longer starts. Error message:

docker@server: [...] failed to load config file "app.ini": open: permission denied

After a long time of pondering and restarting the Docker client several times, I see that rsync has written the permissions of the source file to the target file, which was not the case with cp before: -rw-------. Now I change this by running chmod +r app.ini. Everything starts up again as usual! 🎉

Restart all applications

Since I run everything on my server in Docker, two simple commands are sufficient:

docker stop $(docker ps -a -q) 2>&1
# [...roll out config changes...]
docker restart $(docker ps -a -q) 2>&1

The finished automation

My script is now finished and simply calls the action script.

#! /bin/bash
#/opt/automation-hooks-handler/server-config-handler.sh
#[...]
   if [[ "$ATIME" != "$LTIME" ]]
   then    
       echo "--- CONFIG UPDATE TRIGGER detected ---"
       ./server-config-action.sh 2>&1
       LTIME=$ATIME
   fi
   sleep 10
done

The Script server-config-action then executes the above described actions:

#! /bin/bash
#/opt/automation-hooks-handler/server-config-action.sh

echo "--- RUN backup ---"
docker exec borgmatic sh -c "cd && borgmatic --stats -v 1 --files 2>&1"
echo "--- STOP all containers ---"
docker stop $(docker ps -a -q) 2>&1
echo "--- DEPLOY config ---"
cp -r -v /opt/server-config/. /opt 2>&1
echo "--- RESTART all containers ---"
docker restart $(docker ps -a -q) 2>&1

In order for the whole thing to work reliably, the three handler scripts must run in the background:

  • server-config-handler.sh
  • backup-pre-handler.sh
  • backup-post-handler.sh

They react to the respective triggers by act_runner from Gitea or by borgmatic. I will now expand the crontab accordingly and then I am done with the task for now. The console output of the entire process looks like this:

--- RUN backup ---
/etc/borgmatic.d/config.yml: Running 4 commands for pre-backup hook
Triggering container shutdown for backup.
--- RUN backup-pre-action.sh ---
Shutting down containers for backup:
gitea...                                                                                           
fail2ban...
complete.
Assuming container shutdown complete. Creating the backup now.
local: Creating archive
<borg archive stats>
/etc/borgmatic.d/config.yml: Running 4 commands for post-backup hook
Triggering container restart after backup.
--- RUN backup-post-action.sh ---
Restarting containers after backup:
fail2ban...
gitea...                              
watchtower...
complete.
Assuming container restart complete. Exiting.
local: Pruning archives
local: Compacting segments
compaction freed about 1.82 MB repository space.
local: Running consistency checks
summary:
/etc/borgmatic.d/config.yml: Successfully ran configuration file
--- STOP all containers ---
<container ids>
--- DEPLOY config ---
sending incremental file list
<files that are copied>
sent 206,558 bytes  received 3,463 bytes  420,042.00 bytes/sec
total size is 193,167  speedup is 0.92
--- RESTART all containers ---
<container ids>

Great! Now all I need is for this log to be delivered to me if something goes wrong.