Note: I’m now using an updated SystemD service file at the bottom of this post.
Things go wrong with technology, it’s bound to happen. I needed a simple way to monitor scheduled jobs. The majority of these jobs are spawned systemd services via systemd timers.
While researching for a system I found healthchecks.io which allow you to host your own instance. The next step was to trigger a ping when a systemd service status changed. For this I leveraged OnSuccess/OnFailure hooks with a few custom services.
An example of a service file I’m monitoring:
We pass the ping UUID for both healthcheck@.service
and healthcheck-failure@.service
as an argument. These services simply run curl
to ping the healthchecks.io instance:
Combined with the integration’s healthchecks.io offer I’m now notified when something goes wrong. I prefer the simplicity of this setup because it can be integrated with other tooling, for example Vorta/borg which I use for machine backups:
In future I may look at attaching logs and/or trying to migrate to a single healthcheck service.
Updated 2023-01-18
Based on a colleagues feedback I’m now using this updated SystemD template service file.
This allows me to use the same template file, whilst supporting the start and logging options of Healthchecks.io:
The :failure
, :success
and :start
are important as without them the $MONITOR_*
environmental variables are not passed through to the service. See this quote from the manual:
$MONITOR_SERVICE_RESULT, $MONITOR_EXIT_CODE, $MONITOR_EXIT_STATUS, $MONITOR_INVOCATION_ID, $MONITOR_UNIT
Only defined for the service unit type. Those environment variables are passed to all ExecStart= and ExecStartPre= processes which run in services triggered by OnFailure= or OnSuccess= dependencies.
Variables $MONITOR_SERVICE_RESULT, $MONITOR_EXIT_CODE and $MONITOR_EXIT_STATUS take the same values as for ExecStop= and ExecStopPost= processes. Variables $MONITOR_INVOCATION_ID and $MONITOR_UNIT are set to the invocation id and unit name of the service which triggered the dependency.
Note that when multiple services trigger the same unit, those variables will be not be passed. Consider using a template handler unit for that case instead: “OnFailure=handler@%n.service” for non-templated units, or “OnFailure=handler@%p-%i.service” for templated units.