Something we've found to be fairly lightweight (compared to e.g. Chronos), but incredibly featureful is using Jenkins (the CI server) as a cron runner. We use http://docs.openstack.org/infra/jenkins-job-builder/ to configure it at deploy-time so it lives as part of the deploy rather than system config.
Here's a small list of things we're getting out of it:
- fancy scheduling: e.g. run this job once every 24h, but if it fails keep retrying in 5 minute increments (https://wiki.jenkins-ci.org/display/JENKINS/Naginator+Plugin ). You could also use project dependencies for pipelines, but we've been staying away from that.
- monitoring: we use the datadog reporter & alert on time since last success. Given how mature Jenkins is, this likely translates to whatever system you're using just as well.
It's worked incredibly well for us. We migrated to Jenkins from crontabs with cronwrap (https://github.com/zomo/cronwrap). We're never going back.
Once I had a job that went stray and got the disk full with logs. Since Jenkins couldn't write to the disk anymore, it stopped working completely and thus no jobs and more importantly no notifications. Funny thing, there was one job to monitor the free disk space but the stray app wrote ~100GB in less than 15 minutes (damn SSDs :p).
Another time (times actually), I had the OOM killer kill a jenkins related process. Being a JVM based app and starting with about 1GB of RAM use, doesn't help I guess. This lead Jenkins to hang on a job; timeout didn't work, I couldn't even stop the job manually. Other jobs wouldn't start and no notifications would be sent again.
For those preferring a self-hosted oss monitoring solution, Jenkins is a good multi-purpose choice (it does more than continuous integration!).
I inherited a legacy application with tons of cron jobs running scripts on the production server. Instead of risking moving our jobs to jenkins, we're simply using jenkin's post endpoint to post job results from the cron jobs themselves. It's not perfect, and doesn't give us all the goodies listed above, but it does give us more visibility on the jobs themselves until we can move them all off reliably. +1 from me if you are in a similar situation.
We use Jenkins for a cron-replacement too. We've noticed all the benefits you mention plus it's dead easy for others in the organization to (re)run tasks, even with different parameters.
Here's a small list of things we're getting out of it:
- concurrent run protection (& queue management via https://wiki.jenkins-ci.org/display/JENKINS/Concurrent+Run+B... )
- load balancing (e.g. max concurrent tasks) and remote execution with jenkins slaves [sounds complicated, but really jenkins just knows how to SSH]
- job timeouts. No more hanging jobs.
- failure notifications via slack/hipchat/email/whatever. [email only on status change via https://wiki.jenkins-ci.org/display/JENKINS/Email-ext+plugin ]
- log/history management: rotation & compression.
- fancy scheduling: e.g. run this job once every 24h, but if it fails keep retrying in 5 minute increments (https://wiki.jenkins-ci.org/display/JENKINS/Naginator+Plugin ). You could also use project dependencies for pipelines, but we've been staying away from that.
- monitoring: we use the datadog reporter & alert on time since last success. Given how mature Jenkins is, this likely translates to whatever system you're using just as well.
It's worked incredibly well for us. We migrated to Jenkins from crontabs with cronwrap (https://github.com/zomo/cronwrap). We're never going back.