Hacker News new | past | comments | ask | show | jobs | submit login

A global job timeout might be unreasonable with high variance in workload. Eg some jobs taking 0.5 seconds and others 30 seconds. You might set a global timeout of say 60s but it sucks to wait 59.5s to reap that short job whose worker crashed. A better system is to make workers update a timestamp on an interval and you reap any jobs that haven't been updated in N seconds.



It's a trade off between updates per sec and latency.

Maybe simply using a timeout per job type is a better way. (That of course trades off simplicity.)


I agree. Frequency of updates also becomes more of an issue as you add workers. Say you have 1000 workers each updating every 2 seconds. That's ~500 timestamp update statements per second which is not trivial in terms of added load on the DB.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: