Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Deployment and maintenance should not happen in the middle of the night. We've known this for years.

You design your systems to degrade gracefully, hopefully transparently. Screwing up your engineers' sleep schedules is a terrible idea, so you deploy during the day when people are awake, clear-headed, and on the clock.

I'll take a 1PM outage over a 1AM outage any day. If that's bad for business, business should pay for the engineering time to build a more resilient system.



This is not how businesses are run. No sane person running a business will take a 1PM outage over 1AM. The whole point of heroku is that you are relying on them to do a lot of the infrastructure systems because you are low on manpower. The resilient system will be built over time and is being worked upon. But to put the blame on heroku's customer for the outage is nonsensical.

People do night shifts in so many industries. There's nothing wrong with that, it's just part of the job. My building security guard does a night shift and what he does is way more important and needs more attention than an engineer. Unless you think we should not have security guards in the night.


Very few professionals do night shifts. Equating a security guard with IT work is insane. The intellectual demands are nowhere near equivalent.

For professional examples of "night shifts", we have ER doctors. They do work 12-24 hour shifts -- but they're allotted a room they can sleep in between patients, and they often have 2-7 days between their shifts.

You going to pay infrastructure staff what you pay ER doctors for as much on-task work? Are those people going to actually know what to do when you put them on-task?

What happens when the IT staff needs to get a developer on-hand to resolve an issue? Wake them up?


(for cases like Heroku) When infrastructure staff fuck up, people don't die. There's a massive difference between "make sure a few scripts I wrote earlier work right" and "patch up the guy who can see his guts cause some drunk driver ran a red light at 3 AM"

>What happens when the IT staff needs to get a developer on-hand to resolve an issue? Wake them up?

Yes. Have you never worked at a place with an on-call dev team?


Eh, the better option would be to run a EU and a US west team. A 10 hour time difference lets you do something like:

10am - 10pm [on call] for US West.

7am - 7pm [on call] for EU [UTC +1]

4 days on / 3 days off. [48 hr on call, 40 hours of work]

Its doable. Just expensive to cover 2 jurisdictions and maintain 4 shifts worth of people.

No system is ever going to be 100% reliable and cost effective.


You're right, totally doable. I should have said 24/7 for the price of 8/5 is going away.


Yep, and paying for 8/5 and getting 24/7 is unreasonable. ;)


Plenty of sane people prefer a 1PM outage over a 1AM. Everybody is in the office. Everybody is wide awake.

Of course people do night shifts, but at the same time, we've learned that you're better off fixing things during the day shift. The night shift is an emergency crew.

And the idea that a building security guard needs to pay more attention than an engineer is beyond ludicrous.


> No sane person running a business will take a 1PM outage over 1AM.

Hey what about us people in Asia who use Heroku ... a early US AM outage is likely to be in our early PMs.


I think an important point to note is continuous deployment, twice a day seems logical and minimal to me, even if the deployment is deploying code from yesterday (a 1-day gap).


That's like, three logical fallacies tied together.

You've made an argument to authority with your lead-off sentence of "that's not how businessses are run" and a no true scotsman argument with "no sane person running a business", and then tied it up with an absurdo ad reductum analogy about security guards.

There are plenty of shops that I've personally worked with, and many more that I've read about here on HN that do daytime maintenance / make potentially breaking changes to production. I'd be willing to bet Heroku does a mix of day and night time scheduled maintenances, with scheduling determined by a reasoned analysis of the risks and potential for an outage.

If you think a 1 AM maintenance window helps anything, I'd bet you've never seen the sunlight come up as you staggered out of a datacenter 10 hours after your "midnight maintenance" went south, and no one with an optical light meter to help you trace down a dodgy fiber run was awakee at your transit provider, so you had to wait until the first shops opened to buy your own...

...ok, maybe I'm a little bitter. What I'm getting at is, a 1 PM outage - although more customer facing than a 1 AM outage - will almost definitely be resolved faster, because resources outside of the engineers performing the maintenance are in abundance at 1 PM and not 1 AM.

Bonus points: your engineers will not be zombies for the next three days, and that's important both for post-outage work and to make sure everyone stays sharp for the potential "bounce outage" (e.g. your first outage was the core router dying, the second will hit the same week when your new replacement core router's slightly upgraded IOS version causes a BGP flap under certain conditions that don't arise until a good 48 hours after you mail out your customer outage report....)

That might sound trivial, but at scale - if you have tons of engineers and are doing tons of maintenance all the time - it's actually really important, at least IMHO.

(Quick, take a guess - how many Heroku daytime maintenances do you think went off without a hitch that you've never heard of?)

P.S. what time zone are you in? I assume your time zone's 1 AM is the important one, so someone else on the other side of the country is going to either have a late evening or early morning maintenance window..... ;)


I have never lived in a place with security guards in the night. I think a big part of freedom and privacy is dead when you have to have security guards. Of topic, sorry.


No one with half a backbone should be willing to do scheduled work all day and then during any portion of the night.

You need to build a resilient system, and that's gonna cost you. It's a lot less fun than adding features, which is why business people tend to put it off and instead manipulate the tech people into working crazy hours.

The 24/7 web culture is dying. Adjust your business plan accordingly.


You don't have to. You can take the prior or following day off after the overnight work.


I was never presented this option by any of my previous employers.


That sucks. Whilst we're able to do most work during the 9-5, there's the odd time I'll come in at 5 or the IT manager works a Saturday because we have to take down key infrastructure to work on it.

If your working environment is good, it's pretty easy to just say "I'll take tomorrow off and come in to do it on Saturday" without feeling like you're getting taken advantage of.


You have no idea what you're talking about.


As a seasoned engineer, I could not agree more with the justifications you provide.

As a business owner who's customers are actively engaged with my staff who need servers and resources to be available during the day when, you know, my CUSTOMERS are awake, your recommendation for outage windows would be completely ignored.

You schedule outages around the people who pay the bills, not the people who don't.


The plan is to never have an outage. Instead be scheduling maintenance that should, at worst, result in some backed up queues or non-functional admin functions.

Building a system for graceful degradation costs time and money.


Number of banks or hospitals alrs has worked at: 0

Hope for the best, and plan for the worst: Build resilient systems when possible, but why risk (or guarantee) outages during the day? OS upgrades, router swapouts, and so on can NOT take place during the day. That would be foolish in most industries.

Also realize that "just make systems resilient" is not an easy thing when multi-million dollar transactions are occurring on 30 year old code. If everything were greenfield, it'd be different. But it's not.


Always be proving that your infrastructure is resilient. Pain should be felt by stakeholders, not engineering.

When 24/7 means millions of dollars, hire a four-shift redundant geo-distributed team.


Thing is, a lot of places are doing that. The downside is the geo-distribution gives us Indian contractors with strange accents, inability to communicate and assert, and overall a huge communications issue by default.

Australia needs to step up its subsidies to IT outsourcing firms.


For those saying that the parent comment is crazy, or not applicable to "real businesses" - nearly every production push at Google is done during the normal workday hours of the respective development team (which means most of them happen between 9am-5pm PST).

The benefit of having the engineering team around to troubleshoot smaller-scale issues before they turn into large-scale outages vastly outweighs the small benefit of potentially moving the extremely rare massive outage to a less busy time of day.


Is that true? I'm not being sarcastic -- I just have always generally assumed that maintenance was better performed "off-hours".

Also, 1pm and 1am aren't the only options. 8am EDT would have been within the work day but still avoided peak usage in the US.


Yes, Amazon have a rule to deploy during working hours, during core hours when engineers are around and available should things go wrong. Also, Amazon is one of a number of organisations who don't deploy releases on Friday.


Amazon does not deploy changes during times that they feel can be customer impacting. For North America this means that deployments will be during the night.


IF you can afford the service interruption, which is true for lots of companies and their internal IT systems, you always follow this policy. All concerned including the users are awake and on their normal sleep and work cycle, if you mess something up you find out immediately from those users, etc. etc.


8AM EDT is 5AM PDT, which is where web tech can be found in the United States.

I'm sure you can find engineers willing to work garbage-man hours, but it's gonna cost you.


8AM EDT is 5AM PDT, which is where web tech can be found in the United States.

I'm not sure if you're just being sarcastic, but talent in North America isn't limited to the Pacific Timezone, and such a claim borders on hilarious delusion. I don't say hilarious emotionally or pejoratively, but rather it is actually ha ha funny that anyone would actually believe that.

However yes, there are endless loads of top-skill talent that will happily do "maintenance" in the middle of the night. I've done it on occasion, and slept in the next day. Big deal. Half the time I simply took a timeout from some online gaming I'd been doing.


We don't have "loads" of top-skill talent to begin with.

The pool of top-skill-talent-with-a-self-esteem-problem is rapidly diminishing.


To do something off hours implies a self-esteem problem? That is absurd. People do stuff off hours in return for something, which might be compensatory "time off", additional pay or bonuses, etc.


To do scheduled off-hours work on top of scheduled on-hours work and not get paid for it indicates that the worker has a damaged sense of self-worth.


It's all about reducing the risk. Accidents happen no matter how well you've engineered your maintenance and deployment practices.

Do everything you can to reduce the risk, including building a resilient system, AND not taking a major swath of it offline during a high traffic period.


If you are a company who is using Heroku, you likely do not have the resources (knowledge or money) to set up a system which degrades gracefully when your entire hosting platform goes down. That's a hard, and expensive, contingency to plan for.

And even if they did build their service in a way which could tolerate a complete hosting platform failure - degrading a core service during your customer's business hours is enough to make heads roll. It's just stupid.


No, that depends on what your traffic curve looks like. For me, I have to do my maint at 12AM PST, because that is when we are in a trough with respect to traffic.


Why not have engineers in different time zones?

12:30am on EST is 3:30pm AEST (Australian Eastern Standard Time) and 7:30am in London.

That way you have skilled people awake all hours should a major issue develop. These people can fix the problems while US based clients are sleeping and non-US clients aren't stuck with waisting a day with less severe faults because the engineers only work 9 to 5 PST.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: