Unfortunately, I work on a reasonably modern ERP system which has been customize...

Unfortunately, I work on a reasonably modern ERP system which has been customized significantly for the client and also works with wider range of client-specific data combinations that the vendor has seemingly not anticipated / other clients do not have.

What it means is that on a regular basis, teams will be woken up at 2am because a batch process aborted on bad data; AND it doesn't tell you what data / where in the process it aborted.

The only possibility is to rerun the process with crippling traces, and then manually review the logs to find the issue, remove it, and then re-run the program again (hopefully remembering to remove the trace:).

Even when all goes per plan, this can at times take more than 4 hrs.

Now, we are not running a mission-critical real-time system like air traffic; and I'm in NO way saying any of this is good; but, it may not be the case that "two level of support teams didn't know anything" - the system could just be so poorly designed that with best operational experience and knowledge, it still took that long :-< .

On HN, we take certain level of modernity, logging, failure states, messaging, and restartability for granted; which may not be even remotely present on more niche or legacy system (again, NOT saying that's good; just indicating issue may be less with operational competence vs design). It's easy to judge from our external perspective, but we have no idea what was presented / available to support teams, and what their mandatory process is.