Hacker News new | past | comments | ask | show | jobs | submit login

I've been using Dead Man's Snitch[0] in production for a few years. It's been a life saver. Not affiliated, just a happy customer.

[0] https://deadmanssnitch.com/




Seconded. DMS is the easiest thing to just drop-in on the Nth cron job you add. Eventually you might need something more complicated for monitoring/outages/etc, and that something is probably either a whole lot of Nagios and bailing wire and/or PagerDuty, but DMS is perfect for "I really need Tarsnap backups to not just silently fail."

I also end up creating a lot of Twilio scripts which are either positive control or negative control for the call/SMS, depending on how critical the thing is that I'm monitoring. For example, one of my sites updates an /api/healthcheck result with a timestamp every five minutes if everything is going peachy, and another box polling that endpoint blows up my phone if it fails to get HTTP 200 and a timestamp within the last five minutes. (This works, but I swear I need to tweak it just a wee bit, as today I had my once quarterly woken-up-at-4-AM-because-gremlins-ate-a-single-HTTP-request.)


This reminds me of https://docs.google.com/a/gravitant.com/document/d/199PqyG3U... on how you should only wake up engineers when there really is a problem. I'd suggest logging based on error messages -- though I get it, if a problem occurs upstream, you wouldn't know it unless you'd polled for it too, as a data point. HN comments on that doc at: https://news.ycombinator.com/item?id=8450147


Shameless plug: https://healthchecks.io Same idea, open source


Healthchecks.io looks really interesting, both because it's an open source django project and because I was disappointed with Dead Man's Snitch. DMS forces me to live within their timing for running checks -- If you have something that has to occur @ 3am every morning, you won't know it failed until midnight UTC later that day, or when a customer calls to complain.

Healthchecks handles this a lot more sensibly. I might throw it on a linode and give it a shot. Thanks for releasing it.


Wow, that's awesome. That really is the biggest problem with DMS. I asked them about that feature a couple years ago, they said it was on the roadmap. Might ping them again.


I'll throw in a vote for DMS. I use it at work to verify that our cron jobs ran successfully. Dead simple and very effective.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: