systemd sucks

How to umount NFS before killing any processes.
or
How to save your state before umounting NFS.
or
The blind and angry leading the blind and angry down a thorny path full of goblins.
April 29, 2017

A narrative, because the reference-style documentation sucks.

So, rudderless Debian installed yet another god-forsaken solipsist piece of over-reaching GNOME-tainted garbage on your system: systemd. And you've got some process like openvpn or a userspace fs daemon or so on that you have been explicitly managing for years. But on shutdown or reboot, you need to run something to clean up before it dies, like umount. If it waits until too late in the shutdown process, your umounts will hang.

This is a very very very common variety of problem. To throw salt in your wounds, systemd is needlessly opaque even about the things that it will share with you.

"This will be easy."

Here's the rough framework for how to make a service unit that runs a script before shutdown. I made a file /etc/systemd/system/greg.service (you might want to avoid naming it something meaningful because there is probably already an opaque and dysfunctional service with the same name already, and that will obfuscate everything):

[Unit]
Description=umount nfs to save the world
After=networking.service

[Service]
ExecStart=/bin/true
ExecStop=/root/bin/umountnfs
TimeoutSec=10
Type=oneshot
RemainAfterExit=yes

The man pages systemd.unit(5) and systemd.service(5) are handy references for this file format. Roughly, After= indicates which service this one is nested inside -- units can be nested, and this one starts after networking.service and therefore stops before it. The ExecStart is executed when it starts, and because of RemainAfterExit=yes it will be considered active even after /bin/true completes. ExecStop is executed when it ends, and because of Type=oneshot, networking.service cannot be terminated until ExecStop has finished (which must happen within TimeoutSec=10 seconds or the ExecStop is killed).

If networking.service actually provides your network facility, congratulations, all you need to do is systemctl start greg.service, and you're done! But you wouldn't be reading this if that were the case. You've decided already that you just need to find the right thing to put in that After= line to make your ExecStop actually get run before your manually-started service is killed. Well, let's take a trip down that rabbit hole.

The most basic status information comes from just running systemctl without arguments (equivalent to list-units). It gives you a useful triple of information for each service:

greg.service                loaded active exited

loaded means it is supposed to be running. active means that, according to systemd's criteria, it is currently running and its ExecStop needs to be executed some time in the future. exited means the ExecStart has already finished.

People will tell you to put LogLevel=debug in /etc/systemd/system.conf. That will give you a few more clues. There are two important steps about unit shutdown that you can see (maybe in syslog or maybe in journalctl):

systemd[4798]: greg.service: Executing: /root/bin/umountnfs
systemd[1]: rsyslog.service: Changed running -> stop-sigterm

That is, it tells you about the ExecStart and ExecStop rules running. And it tells you about the unit going into a mode where it starts killing off the cgroup (I think cgroup used to be called process group). But it doesn't tell you what processes are actually killed, and here's the important part: systemd is solipsist. Systemd believes that when it closes its eyes, the whole universe blinks out of existence.

Once systemd has determined that a process is orphaned -- not associated with any active unit -- it just kills it outright. This is why, if you start a service that forks into the background, you must use Type=forking, because otherwise systemd will consider any forked children of your ExecStart command to be orphans when the top-level ExecStart exits.

So, very early in shutdown, it transitions a ton of processes into the orphaned category and kills them without explanation. And it is nigh unto impossible to tell how a given process becomes orphaned. Is it because a unit associated with the top level process (like getty) transitioned to stop-sigterm, and then after getty died, all of its children became orphans? If that were the case, it seems like you could simply add to your After rule.

After=networking.service getty.target

For example, my openvpn process was started from /etc/rc.local, so systemd considers it part of the unit rc-local.service (defined in /lib/systemd/system/rc-local.service). So After=rc-local.service saves the day!

Not so fast! The openvpn process is started from /etc/rc.local on bootup, but on resume from sleep it winds up being executed from /etc/acpi/actions/lm_lid.sh. And if it failed for some reason, then I start it again manually under su.

So the inclination is to just make a longer After= line:

After=networking.service getty.target acpid.service

Maybe getty@.service? Maybe systemd-user-sessions.service? How about adding all the items from After= to Requires= too? Sadly, no. It seems that anyone who goes down this road meets with failure. But I did find something which might help you if you really want to:

systemctl status 1234

That will tell you what unit systemd thinks that pid 1234 belongs to. For example, an openvpn started under su winds up owned by /run/systemd/transient/session-c1.scope. Does that mean if I put After=session-c1.scope, I would win? I have no idea, but I have even less faith. systemd is meddlesome garbage, and this is not the correct way to pay fealty to it.

I'd love to know what you can put in After= to actually run before vast and random chunks of userland get killed, but I am a mere mortal and systemd has closed its eyes to my existence. I have forsaken that road.

I give up, I will let systemd manage the service, but I'll do it my way!

What you really want is to put your process in an explicit cgroup, and then you can control it easily enough. And luckily that is not inordinately difficult, though systemd still has surprises up its sleeve for you.

So this is what I wound up with, in /etc/systemd/system/greg.service:

[Unit]
Description=openvpn and nfs mounts
After=networking.service

[Service]
ExecStart=/root/bin/openvpn_start
ExecStop=/root/bin/umountnfs
TimeoutSec=10
Type=forking

Here's roughly the narrative of how all that plays out:

When my wlan-discovery script is ready to start openvpn, it executes systemctl start greg.service.

Then /root/bin/openvpn_start is executed:

#!/bin/bash
openvpn --daemon --config ... --remote `cat /run/openvpn/remoteinfo`
( echo 'nameserver 10.1.0.1'; echo 'search myvpn' ) | resolvconf -a tun0
mount | grep -q nfsmnt || mount -t nfs -o ... server:/export /nfsmnt
exit 0

Because of Type=forking, systemd considers greg.service to be active as long as the forked openvpn is running (note - the exit 0 is important, if it gets a non-zero exit code from the mount command, systemd doesn't consider the service to be running).
Then a multitude of events cause the wlan-discovery script to run again, and it does a killall -9 openvpn. systemd sees the SIGCHLD from that, and determines greg.service is done, and it invokes /root/bin/umountnfs:
```
#!/bin/sh
if [ "$EXIT_STATUS" != "KILL" ]
then
	umount.nfs -f /nfsmnt
fi
```
umountnfs does nothing because $EXIT_STATUS is KILL (more on this later)
wlan-discovery finishes connecting and re-starts greg.service
Eventually, system shutdown occurs and stops greg.service, executing /root/bin/umountnfs, but this time without EXIT_STATUS=KILL, and it successfully umounts.
Then the openvpn process is considered orphaned and systemd kills it.
While /root/bin/umountnfs is executing, I think that all of your other shutdown is occurring in parallel.

So, this EXIT_STATUS hack... If I had made the NFS its own service, it might be strictly nested within the openvpn service, but that isn't actually what I desire -- I want the NFS mounts to stick around until we are shutting down, on the assumption that at all other times, we are on the verge of openvpn restoring the connection. So I use the EXIT_STATUS to determine if umountnfs is being called because of shutdown or just because openvpn died (anyways, the umount won't succeed if openvpn is already dead!). You might want to add an export > /tmp/foo to see what environment variables are defined.

And there is a huge caveat here: if something else in the shutdown process interferes with the network, such as a call to ifdown, then we will need to be After= that as well. And, worse, the documentation doesn't say (and user reports vary wildly) whether it will wait until your ExecStop completes before starting the dependent ExecStop. My experiments suggest Type=oneshot will cause that sort of delay...not so sure about Type=forking.

Fine, sure, whatever. Let's sing Kumbaya with systemd.

I have the idea that Wants= vs. Requires= will let us use two services and do it almost how a real systemd fan would do it. So here's my files:

/etc/systemd/system/greg-openvpn.service:

[Unit]
Description=openvpn
Requires=networking.service
After=networking.service

[Service]
ExecStart=/root/bin/openvpn_start
TimeoutSec=10
Type=forking

/etc/systemd/system/greg-nfs.service:

[Unit]
Description=nfs mounts
Wants=greg-openvpn.service
After=greg-openvpn.service

[Service]
ExecStart=/root/bin/mountnfs
ExecStop=/root/bin/umountnfs
TimeoutSec=10
Type=oneshot
RemainAfterExit=yes

/root/bin/openvpn_start:

#!/bin/bash
openvpn --daemon --config ... --remote `cat /run/openvpn/remoteinfo`
( echo 'nameserver 10.1.0.1'; echo 'search myvpn' ) | resolvconf -a tun0

/root/bin/mountnfs:

#!/bin/sh
mount | grep -q nfsmnt || mount -t nfs -o ... server:/export /nfsmnt
exit 0

/root/bin/mountnfs:
```
#!/bin/sh
umount.nfs -f /nfsmnt
```

Then I replace the killall -9 openvpn with systemctl stop greg-openvpn.service, and I replace systemctl start greg.service with systemctl start greg-nfs.service, and that's it.

The Requires=networking.service enforces the strict nesting rule. If you run systemctl stop networking.service, for example, it will stop greg-openvpn.service first.

On the other hand, Wants=greg-openvpn.service is not as strict. On systemctl start greg-nfs.service, it launches greg-openvpn.service, even if greg-nfs.service is already active. But if greg-openvpn.service stops or dies or fails, greg-nfs.service is unaffected, which is exactly what we want. The icing on the cake is that if greg-nfs.service is going down anyways, and greg-openvpn.service is running, then it won't stop greg-openvpn.service (or networking.service) until after /root/bin/umountnfs is done.

Exactly the behavior I wanted. Exactly the behavior I've had for 14 years with a couple readable shell scripts. Great, now I've learned another fly-by-night proprietary system.

GNOME, you're as bad as MacOS X. No, really. In February of 2006 I went through almost identical trouble learning Apple's configd and Kicker for almost exactly the same purpose, and never used that knowledge again -- Kicker had already been officially deprecated before I even learned how to use it. People who will fix what isn't broken never stop.

As an aside - Allan Nathanson at Apple was a way friendlier guy to talk to than Lennart Poettering is. Of course, that's easy for Allan -- he isn't universally reviled.

A side story

If you've had systemd foisted on you, odds are you've got Adwaita theme too.

rm -rf /usr/share/icons/Adwaita/cursors/

You're welcome. Especially if you were using one of the X servers where animated cursors are a DoS.

People who will fix what isn't broken never stop.

[update August 10, 2017]

It worked marvelously and everything bad fixed itself immediately.

I found out the reason my laptop double-unsuspends and other crazy behavior is systemd. I found out systemd has hacks that enable a service to call into it through dbus and tell it not to be stupid, but those hacks have to be done as a service! You can't just run dbus on the commandline, or edit a config file. So in a fit of pique I ran the directions for uninstalling systemd (mirrored).

It worked marvelously and everything bad fixed itself immediately. The coolest part is restoring my hack to run openvpn without systemd didn't take any effort or thought, even though I had not bothered to preserve the original shell script. Unix provides some really powerful, simple, and *general* paradigms for process management. You really do already know it. It really is easy to use.

I've been using sysvinit on my laptop for several weeks now. Come on in, the water's warm!

So this is still a valuable tutorial for using systemd, but the steps have been reduced to one: DON'T.

[update September 27, 2017]

systemd reinvents the system log as a "journal", which is a binary format log that is hard to read with standard command-line tools. This was irritating to me from the start because systemd components are staggeringly verbose, and all that shit gets sent to the console when the services start/stop in the wrong order such that the journal daemon isn't available. (side note, despite the intense verbosity, it is impossible to learn anything useful about why systemd is doing what it is doing)

What could possibly motivate such a fundamental redesign? I can think of two things off the top of my head: The need to handle such tremendous verbosity efficiently, and the need to support laptops. The first need is obviously bullshit, right -- a mistake in search of a problem. But laptops do present a logging challenge. Most laptops sleep during the night and thus never run nightly maintenance (which is configured to run at 6am on my laptop). So nothing ever rotates the logs and they just keep getting bigger and bigger and bigger.

But still, that doesn't call for a ground-up redesign, an unreadable binary format, and certainly not deeper integration. There are so many regular userland hacks that would resolve such a requirement. But nevermind, because.

I went space-hunting on my laptop yesterday and found an 800MB journal. Since I've removed systemd, I couldn't read it to see how much time it had covered, but let me just say, they didn't solve the problem. It was neither an efficient representation where the verbosity cost is ameliorated, nor a laptop-aware logging system.

When people are serious about re-inventing core Unix utilities, like ChromeOS or Android, they solve the log-rotation-on-laptops problem.