How to monitor cronjobs scheduling and execution
In every complex application, there is always a need for the functionality intended to be executed by schedule. It can be implemented in many ways — system cronjobs, Kubernetes CronJobs, Celery Periodic Tasks, etc, etc, etc…
In our product, we have a couple of critical cronjobs, and it is critical for our business to keep track of executions of these tasks and be notified when the intended run cycle is skipped, slowed down, or postponed for any reason. While monitoring application errors, performance, or metrics of cronjobs (with APM tools like ElasticAPM) becomes a standard task if you follow the Admin processes principle from the famous Twelve-Factor App methodology, controlling the schedule is more tricky.
A couple of services and opensource tools addressing this issue available on the market — commercial https://cronitor.io/ (good tool but the free quota is too limited), opensource/SaaS https://healthchecks.io/
healthchecks.io
I am going to describe my experience with the latter. It’s available as a SaaS service with a very liberal free quota or as an open-source self-hosted package. In the Saas version, you can add free monitoring of up to twenty cronjobs.
The interface for adding a check is simple and intuitive.
Sending webhooks
There are multiple ways to track cronjobs execution. I am going to describe only one of them. For the rest, check the documentation.
Every check created in the application has a unique webhook URL, and you can send signals about various stages in the cronjob execution.
Cronjob started
curl -XGET https://hc-ping.com/<uuid>/start
Cronjob failed
curl -XGET https://hc-ping.com/<uuid>/fail
Cronjob successfully finished
curl -XGET https://hc-ping.com/<uuid>
Optionally you can add a URL parameter rid=<execution_id>. That will allow better track cron execution time and aggregated statistics on that metric.
Dashboard
After adding a check and implementing webhooks, you will start receiving notifications when your cronjob was not executed (or finished) in time. Also, you will have a full log of checks, an overview of settings, and a summary overview.
Integrations and notifications
With the plugins and integrations supported by the application, you can send warnings into a channel of your choice.
You can also use a system of status badges in your online documentation or dashboards.
Monthly report
Once a month you will receive an overview of the last month compared to a month before it
API
For the management of your monitoring setup, you can use versatile HTTP API.
Conclusion
- This simple application dramatically improved observability for our mission-critical cronjobs.
- Not clear why this functionality is not yet a part of the Elasticsearch Observability Stack.
- Would be nice to have auto-provisioning and auto-configuration for the checks (so you can add a cronjob to your application and automatically create/configure a check with the first webhook request). But the author of the app decided not to implement this feature.
- You can start with the SaaS version, and later, when the free quota exceeds, you can either switch to a paid version (prices are completely reasonable) or install a self-hosted instance.