Hacker News new | past | comments | ask | show | jobs | submit login

Software has bugs, that's not really the damning part... The damning part is that in four hours and two levels of support teams, there was noone who actually knew anything about how the system worked who could remove the problematic flight plan so that the rest of the system could continue operating!

What exactly is the point of these support teams when they can't fix the most basic failure mode (a single bad input...)




Unfortunately, I work on a reasonably modern ERP system which has been customized significantly for the client and also works with wider range of client-specific data combinations that the vendor has seemingly not anticipated / other clients do not have.

What it means is that on a regular basis, teams will be woken up at 2am because a batch process aborted on bad data; AND it doesn't tell you what data / where in the process it aborted.

The only possibility is to rerun the process with crippling traces, and then manually review the logs to find the issue, remove it, and then re-run the program again (hopefully remembering to remove the trace:).

Even when all goes per plan, this can at times take more than 4 hrs.

Now, we are not running a mission-critical real-time system like air traffic; and I'm in NO way saying any of this is good; but, it may not be the case that "two level of support teams didn't know anything" - the system could just be so poorly designed that with best operational experience and knowledge, it still took that long :-< .

On HN, we take certain level of modernity, logging, failure states, messaging, and restartability for granted; which may not be even remotely present on more niche or legacy system (again, NOT saying that's good; just indicating issue may be less with operational competence vs design). It's easy to judge from our external perspective, but we have no idea what was presented / available to support teams, and what their mandatory process is.


Just guessing:

They bought a software from a third party and treat it as a "black box". There are few known ways that the software fails, and the local team has instructions on how to fix it. But if it fails in an unexpected way, good luck, it's impossible for the local team to identify and fix the problem without the vendor.

The reason it took so much was they realized too late that they need to call the vendor.

Probably you have to blame managers rather than engineers in the support team.


Considering this same failure has happened a few times in recent memory maybe its over optimistic of me to expect an entry on the support wiki or something.


> Considering this same failure has happened a few times in recent memory

Which previous instances are you thinking about?


One important software engineering skill that is often overlook is the art of writing just the right amount of log, such that one could have sufficient information to debug easily when things go wrong, but not too verbose such that it will be ignored or pruned in production.


And when did you last test your monthly backups? But seriously. If you fill out all the positions in an org chart it's easy to think you're delivering, and for a lot of situations it usually works. Anointing someone a manager usually works out because people can muddle through. It doesn't work in medicine, or as it turns out, air traffic control.

Lesson learned for about the next ~5 years.


I wouldn't expect level 1 and level 2 to be able to diagnose a problem like this

level 3 (devs) should have been brought in much quicker though


Having worked in tech support: level 3 (Devs) should have described their source code structure to level 2, and let them access it when they needed it.


You don’t need a complete diagnosis if you can spit out enough debug info that says, “oops shat the bed while working with this flight plan”, then the support people can remove the one that’s causing you to fail, restart the system, and tell ATC to route that one manually.


> What exactly is the point of these support teams when they can't fix the most basic failure mode (a single bad input...)

To collect money on support contracts, I suspect.


Try to get developers who love to code and create to stay on a support team and be on an on-call roster. I betcha at least half will say no, and the other half will either leave or you'll run out of money paying them.


They were probably on vacation




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: