Dead Man's Snitch – Monitoring for Scheduled Tasks

petercooper · on April 7, 2014

I don't know if this is what lead to this going up, but patio11 mentioned this service the other day in his epic post about Tarsnap: http://www.kalzumeus.com/2014/04/03/fantasy-tarsnap/ .. he uses Dead Man's Snitch to ensure his backups continue working (or more accurately, to notify him when they don't!)

ralfn · on April 7, 2014

Nice branding. Most monitoring tools are overly complicated, both in their features as well the use-cases they try to cover. I like the clear focus of this; the product itself doesn't require one to invest a lot of time, which is a good thing.

Just some questions:

- how do you prevent unauthorized snitches from screwing with your data? Wouldn't it be wise to limit a snitch to a certain server ip?

- why only an iOS app? This excludes lots of potential customers (including me).

- why not offer an API, so we can securely fetch the status of our snitches?

I like the singular focus. Some people don't want very complicated feature rich monitor systems. I can imagine using a solution like this, esspecially in those cases, where more feature rich monitor systems are not being considered.

Yet, you offer a pricing model that suggests something completely different. That you are a very complicated feature-rich monitor system. The value you provide at each price point is simply not warranted.

I doubt this will have a large uptake, but if it does, i'm the first one tempted to compete. The required architecture is not complicated enough to warrant such high margins on such low costs. Cloud providers, even the budget ones like DigitalOcean, could easily offer something like this for free to their customers.

Now, you could argue that the price includes high quality support. But that's the not the audience of such a singular focus tool. If i need support, i'm obviously spending enough time interacting with it, that i would be better off with a more feature-rich monitor system. (which are available at similar price points)

Who's going to pay 600 dollars a year for 300 snitches? To build this at scale may take more effort, but from the perspective of your customer .. wouldn't most of them be able to build something like this for internal use in a single day? There are open source solutions with more features you can install with a single command.

I'm just wondering: but wouldn't you have much more customers, and hence, more profit, if you would provide 500 snitches at 100 dollars a year? Or alternatively, offer the service for free, but just charge for the apps yearly? In the end, a single snitch will do at most 24 empty GET requests a day, where you store a timestamp. A single VPS should be able to handle 10k snitches easily. (less than 3 requests per second)

jameskilton · on April 7, 2014

One of the developers here. Glad you liked the branding! As you've mentioned, our goal is to stay stupidly simple to use. We want users to forget we're even here, until something isn't working. Now as for your questions:

> - how do you prevent unauthorized snitches from screwing with your data?

We don't currently have such protections in place basically because it hasn't been necessary. We keep an eye on things and if we start to see such traffic we'll investigate what should be done at that time.

> - why only an iOS app? This excludes lots of potential customers (including me).

Time and customer requests. We've started looking into building an Android app and glanced at Windows Phone as well. That said, mobile apps aren't required to use DMS, the default is to send emails.

> - why not offer an API, so we can securely fetch the status of our snitches?

We have actually just started rolling out a new API! If you're interested in helping us beta test this, get a hold of us at hi@deadmanssnitch.com.

As for pricing, plans are built around the peace-of-mind our product offers customers. Our canonical example is an ops engineer needing a backup only to find out that backups haven't worked for the past weeks/months/etc. At that point the cost of DMS is a pittance to what the lack of backups just cost the company in question.

nikatwork · on April 8, 2014

I would ignore the feedback around pricing. Anyone who beefs over $600 and thinks they "won't need support" is not worth having as a customer. $600 is fine for SMB and wouldn't even register for enterprise.

struppi · on April 7, 2014

You do realize that 600 dollars a year is next to nothing for a business? I don't even run a business, I am just a freelancer, but I'd happily pay 600 USD/year for this service if I needed it.

The price doesn't make a real dent in any serious budget, so why bother? And 600 dollars is really cheap compared to anything that could happen when you miss an important event. Email campaign not sent? This could cost thousands of dollars or more. Backup not done? Could be millions of dollars for some kinds of businesses.

I think the service is priced reasonably. If you think otherwise, start a competitor. If you can convince me that your service will reliably mail me on incidents no matter what catastrophe happens to your data center, I might use it over theirs (Should I ever need such a service. As I said before, I don't need it right now.).

ralfn · on April 7, 2014

>You do realize that 600 dollars a year is next to nothing for a business?

I get that. But at that price point, there are similar solutions available with much more coverage and features. This isn't actually competing with DataDog, Munin, Nagios, New Relic on features or reliability.

>? This could cost thousands of dollars or more. Backup not done? Could be millions of dollars for some kinds of businesses.

If thats the bussiness risk you are taking, you shouldn't be depending on just a ping after a cron job! Phooff. You do realize this thing is essentially only measuring if the cron job has executed. Not if it is succesfull. Sure you can script the success/failure state. But how are you going to test that against a myrid of potential failure states? What does the proccess return when the OOMKILLER gets its? Do you know?

The only reason to use something that 'just does pings' is because it's not important enough, and you don't want to invest that much time and effort. Those other tools require to be a integral part of your setup, but they'll give you much better monitoring capabilities and actual ease of mind. More importantly, rather than warning you when things have gone wrong (after the fact), they'll warn you in advance things are not looking good. They'll monitor the actual proccess, not if they were launched every hour, but what they were up to.

This product is in a very interesting niche because it seems to target projects-not-important-enough-i-wouldnt-bother-but-if-its-this-easy-why-not. But it is not priced accordingly.

>Should I ever need such a service. As I said before, I don't need it right now.

Those projects where you don't need them for: those are only ones where a product like this makes sense. If there are lots of dollars on the line, i strongly suggest you use actual industrial-grade monitoring solutions, not a ping after a cron job.

frossie · on April 7, 2014

>But at that price point, there are similar solutions available with much more coverage and features.

Exactly. I sit on a quarter-million dollar budget for hardware and services, but I still have a value-for-money filter, and this doesn't pass it. I might as well use this:

https://github.com/grahambell/crab

for free, or pay for a monitoring solution that is higher reliability, is inside my network, and doesn't charge me per cronjob. Even at the bulk plan, half a cent for a single cron job to send a single daily http request and only alert the one guy who has an iPhone seems... unreasonable. It doesn't scale for me.

By way of comparison, for that price I can get 20 private repos on Github, which consume oodles of space and networking resources.

mgkimsal · on April 7, 2014

"I don't even run a business, I am just a freelancer, but I'd happily pay 600 USD/year for this service if I needed it."

Yes, you do run a business, yourself. But outside that...

The problem with this sort of thinking is that, yes, sure, if you need it, you could happily pay $600/year for this service, and another, and another, and another, and pretty soon you're dropping $5-$10k on services. For a large business, that's probably not an issue (but even then, it's always out of someone's budget, not the entire company's), but for smaller companies, you're approaching a pain threshold.

ralfn · on April 7, 2014

>If you can convince me that your service will reliably mail me on incidents no matter what catastrophe happens to your data center

You do realize that it would take years to even establish a significant baseline against a simple VPS on any cloud provider in uptime?

It's much more important that you know which cloud provider and data-centers they are using, so you can verify you aren't actually running your servers in the same data-center.

Not that they mention it on their site, but a simple traceroute will tell you their ping-server is just a (bunch of?) Amazon instance(s). And Amazon is never down, right?

If somebody can convince you that their service will never be down, it's just means you gullible.

On the other hand, if your monitoring-service is in a different data-center from a different cloud provider, statistically speaking, chances are low, that your system and their system will be down at the same time. But you can expect a few false positives. With this kind of setup, you could get false positives just because outside-internet is not working.

Again, one would not use something like this for sensitive large scale productions. But very few large scale productions are sensitive, and very few sensitive productions are large scale.

Yet, if its this easy, why not slab it on every system anyway? Oh wait, the price.

samcrawford · on April 7, 2014

Small gripe, based upon prior painful experience... you should always set a timeout on curl requests (or any others for that matter). There's not one by default!

It's -m <seconds> in the curl command line client.

jameskilton · on April 7, 2014

Good point, we'll make a note to include this in our documentation!

bowlofpetunias · on April 7, 2014

Been using DMS for over quite a while now. It is an absolute must-have for anybody depending on scheduled tasks, and I haven't found anything like it.

And it's so cheap that there is no reason not to do it.

My only gripe is that I can only get alerted via email, which is the most often overlooked channel. I've hooked those emails up to HipChat, but some direct options for services like HipChat, Pagerduty etcetera would be nice.

ralfn · on April 7, 2014

Doesn't it do push notifications on iOS?

jameskilton · on April 7, 2014

Yes, the iOS app has full push notification support! To suggest other integration points, please contact us at hi@deadmanssnitch.com.

mpclark · on April 7, 2014

As an aside, does anyone make a cron panel for humans? I'd find that really useful as I'm usually so busy being distracted by the next big thing that I forget to take care of repetitive tasks.

I know I can achieve something like that with my Google Calendar but I don't like the way repeating events fill it up -- it would be better if they just showed for the next occurrence.

icebraining · on April 7, 2014

Why not make a second calendar in GC, put the repeating events there (with notifications enabled), then hide it from the main view?

gtCameron · on April 7, 2014

I use FollowUpThen - https://www.followupthen.com/

It does a lot of stuff, but I have it email me every Thursday to remind me to take the trash to the curb. Its just an email, not a calendar alert, but it keeps the clutter off the calendar and I usually see an email within a couple hours anyway.

kareemm · on April 7, 2014

Been using DMS for years now (it was recently bought by Collective Idea). Love it so much that I'm on the testimonials page.

The reason we started using it is because it's hard to notice the absence of an email alerting you that job xyz is complete. E.g. when our nightly backup would complete, we'd get an email. We'd also get N other emails nightly emails about other tasks that would run.

On the rare occasion that e.g. backup wouldn't complete it would be really tough to notice that you didn't get the email.

What you really want is to hear nothing UNLESS something goes wrong, which DMS makes happen. Love love love it.

apinstein · on April 7, 2014

Nice job! -- we actually needed this so badly a few years back that I wrote a simple app in a day to do this myself, though it's not very robust. I am actually surprised it's taken so long for someone to offer a SaaS version. Your pricing is so reasonable that I'd switch to your app if you could just flush out one more feature.

I actually don't want another app for reporting. I already use a monitoring service (pingdom) and would prefer to keep all of our "alerts" organized through a single service.

Could you offer a http-based public status page? Our system allowed each snitch to be tagged with a tag like "net.tourbuzz.db.replication.upToDate" and then a url like http://foo.com/deadoralive?filter=net.tourbuzz would allow us to use a single Pingdom monitor to alert us if any of snitches in that namespace failed.

This is a simple tagging structure that makes it easy to organize large numbers of snitches into manageable groups and easily integrate it into existing monitoring/alerting infrastructure.

Feel free to email me directly if you want more info.

smcleod · on April 7, 2014

Looks very interesting, but unless I can host it myself it's a no-go for me.

bowlofpetunias · on April 7, 2014

Uhm, the whole point of services like these is that they are external and not tied into your own hosting.

Also, it knows absolutely nothing about your services and doesn't require any level of access. All you do is ping it.

0xbadcafebee · on April 7, 2014

It requires internet-level access. So internal hosts that shouldn't have internet access (like a backup machine, where all your most critical data is stored) might not be able to reach it.

rhoml · on April 8, 2014

So, in case you didn't notice. This can be a way to let others take care of notifying you when cron's fail so you can focus on more productive tasks.

genericacct · on April 7, 2014

I can set it up for you on your *ix machine in an afternoon for $200 :)

mmelin · on April 7, 2014

Looks pretty cool! You should really rework the Plans page. At first I didn't realize there was more content below the "Free" plan, because I'm on a 13" OS X machine so scroll bars aren't visible by default.

zufallsheld · on April 7, 2014

If you are like me and already have nagios and nrpe in use, you can use a simple python script[0] to monitor the output of the cronjob and get an alert by nagios. [0]https://github.com/rndmh3ro/check_exit-code

KaiserPro · on April 7, 2014

as much as I hate to admit it, jenkins makes a superb replacement for cron.

It keeps a history, has excellent error handling capabilities, and can scale quite well.

also the integration with git and the like makes updating jobs super easy.

khill · on April 7, 2014

Hudson/Jenkins works well for us. We've moved all our scheduling jobs out of cron and other uglier options (like maestro) into Hudson.

In addition to sending emails when things break, it can also post to our Twitter feed and send SMS alerts.

cjjuice · on April 7, 2014

I like it. Pricing seems a bit high though. Development wise this is a 1-2 day project.

I can't see myself spending much on something I could roll my own version of in a day or two.

lifeisstillgood · on April 7, 2014

So instead of MAILTO=sendmail I use MAILTO=http://deadmandssnitch/report or similar?

0xbadcafebee · on April 7, 2014

That would actually be more stable, since it would rely on a network designed for fault-tolerant redundant store-and-forward messaging.

Instead, each cron job runs an HTTP GET, and later you can view the service's website to see the tests that have requested successfully or haven't responded at all. Unfortunately for your system, if there's a network hiccup, you won't know why your jobs failed and the job will hang on the HTTP call. Also, you're paying for it instead of just sending e-mail alerts to yourself.

patio11 · on April 7, 2014

Unfortunately for your system, if there's a network hiccup, you won't know why your jobs failed and the job will hang on the HTTP call. Also, you're paying for it instead of just sending e-mail alerts to yourself.

Critically importantly for this use case, if you are doing something critical for your business (say, backups or your daily billing run or sending reminders to thousands of people or what have you) and something goes wrong with your network and makes you unable to curl, you get told about that. If you instead do self-hosted email, and your network fails, you don't get told about that.

It's fail-open vs. fail-close. That's literally so core to the product it is alluded to in the name. A Dead Man's Switch doesn't go off if anything goes wrong, it goes off if everything doesn't go right.

0xbadcafebee · on April 7, 2014

Yes, I understand the concept. But it's actually so open-ended that you get woken up in the middle of the night if something doesn't go right, not everything.

If you have intermittent packet loss on your internet connection you get constant alerts about your backups not working, even though it's just the internet that's wonky. Or if this service isn't set up robustly enough, you miss your alerts when they go down. Either way you get no context about the failure, just obscure panic in the middle of the night. On top of that an attacker can enumerate the URLs (or find it via some other means) and send false requests while they take down your service without you noticing, or the opposite effect with a DDoS on the provider.

I'm not saying it's not a useful service. I'm sure plenty of people would rather pay for this thing and have some peace of mind rather than nothing (because most of these users probably aren't technical enough to do something identical in Google App Engine or a VPS). I just have high expectations for solutions that you have to pay for.

Sanddancer · on April 7, 2014

The problem with that setup is that you get alerts that will get lost in the noise. Systems like Nagios, I can set up dependencies, so that I won't get a flood of alerts when the network goes down, or set up maintenance windows so that when I am working on a system, I won't get spurious alerts. Also, self-hosted tools like nagios mean that I can hook it up to send an SMS or use a backup connection, or run a script that triggers an arduino-controlled bat signal to alert me. None of this is really possible with DMS

sedev · on April 7, 2014

That's only an argument that there are some cases where DMS is inappropriate, not that it's inappropriate for all cases. Considering that in general, setting up a robust Nagios install, configuring it, and maintaining it, costs far more than DMS, there are many scenarios where DMS is the rational choice.

nimblegorilla · on April 7, 2014

The biggest problem I see with this service is that exit code 0 does not guarantee the process successfully did what was intended (only that the process ended without errors). This solution seems to leave too many holes for false positives and false negatives to sneak through. When paying for services like this I want it to be bullet-proof rather than bullet-resistant.

jameskilton · on April 7, 2014

Actually something like:

0 6 * * * /bin/sh my_cron_job.sh && curl https://nosnch.in/abc123

Then, when my_cron_job.sh fails (non-0 exit status), no snitch is triggered. DMS notices this hole and notifies you that the job did not run.

spncr2 · on April 15, 2014

the referral features is nice though. Free snitches for every person you refer.