Heisenbugs and Shell Madness
Introduction
What is a Heisenbug you say? For the non-programmer, a Heisenbug is a type of computer bug that disappears or changes behaviour when an attempt to debug is made. This is a pun on the Heisenberg uncertainty principle, which can be loosely understood as the tradeoffs between the physical measurements of a particle. As you try to precisely measure the location of a particle, its momentum becomes more uncertain, and vice versa. Similarly, as attempts are made to precisely probe the Heisenbug, the problem disappears.
Automating Backups
I encountered this type of bug while I was creating an automated backup solution, and it ultimately ended up taking a few weeks to fully understand and debug. My backup source was a storage service called Keybase File System (KBFS), a free encrypted 200GB cloud provided to all Keybase users. Unfortunately, Keybase allows anyone to reset an account if you guess the password, which deletes all the stored files, messages, and data (so much for strong encryption). Obviously, I didn't want my password to be the single point of failure. Using BorgBackup, the KBFS files can be sent to another secure storage provider, and I even wrote a blog post last month on how to use it. The backup script would work if I manually ran it from my terminal window, but surprisingly, it would fail to run if I inserted the script into my crontab (an automated task scheduler).
What gives? I tried running the crontab script as root
, which also didn't work because KBFS needed config files from my normal user. After some StackExchange sleuthing, I tried recreating all the environment variables in the cron script (such as PATH, HOME) from my regular shell. I then patiently waited for the crontab script to run. I checked the backup and bingo bango it worked! I logged off and called it a night.
Attack of the Heisenbug
Unfortunately, when I checked the logs again the following week, I noticed the backups were failing again. After trying a multitude of commands, flags, and options, I noticed that whenever I SSH'd into the machine to debug and analyze the script, the bug would disappear. At some point, I thought the script might require the presence of an interactive shell (input from keyboard and output to screen), so I tried running the script through GNU Screen, but it still displayed the same odd behaviour. I was dealing with a nasty Heisenbug 😵.
As I was watching the activity of the script through htop
, I noticed something peculiar about my own shell. My bash
shell had a hyphen in the process name, while all the other shells didn't. What was this hyphen!?
Shell Madness...
As I found out the hard way, shells are complicated, and there are no less than four types of shells that can be spawned. The presence of a hyphen in the first letter of the first argument to a shell (argv[0][0]
) signifies that it should start as a login shell. This means it will run one-time scripts, for example initializing your shiny desktop environment. Here are the different types of shells:
- Interactive Login: these are created when you login for the first time, either through the physical keyboard or through another method such as SSH
- Interactive Non-Login: these are created whenever you start a new terminal window, for example when you click Terminal in your GNOME sidebar, or when you use GNU Screen
- Non-Interactive Login: rarely used
- Non-Interactive Non-Login: created whenever you run a script, comprises the majority of background tasks that don't require user input
Crontab normally runs scripts as Non-Interactive Non-Login, which didn't work. When I tried running the script with GNU Screen, it didn't work either. I also tried running the script as a Non-Interactive Login shell with the bash --login
shebang which also didn't work.
It turns out that KBFS was checking for the presence of an Interactive Login shell before it would operate properly. The solution is to emulate an interactive login shell. Here is my newly updated root
crontab:
0 0 * * * echo ./backup.sh | su - yoonsik
su - yoonsik
starts an Interactive Login shell, which we then pretend to type into, using the echo
command.
TL;DR
To make a script believe that it's being run from a regular shell, as if you had typed it into your own physical console, use the following command:
# as root
echo ./script.sh | su - username_here