Status pages are the worst of all things to work on.
First, deciding what to put on them is PM hell. Is it customer-facing? Developer-facing? Both?
Then, for each item, how do you test so you can be really sure? Half the time tests will produce either false negatives or false positives. If your test writes data (like, say, posting a comment), is it a meaningful test of it's to a test or demo topic? But if it's to real topics, the test itself becomes user impacting.
Or you can use traffic stats to estimate whether things are going well, but then your traffic stats are question. Something may report zero traffic when things are great, or normal traffic when things are broken.
Then you have the classic "the tests are failing because the testing infra is broken, but production users are fine."
There just isn't a good answer to most of the problems status pages face. So, yeah, they suck, but I think there may be a Godel argument here that it is not possible for them to be timely, meaningful, and accurate.
Most of this is about automated status pages, but it doesn't apply to status pages where the incident response team manually says "yes there is a problem" and "ok all problems are fixed".
True, but those teams have the same problem. How do they know it's really fixed, for every scenario, for everyone, everywhere?
They have a pretty good idea if something internal is broken, but it's very hard to know that everything is fixed.
That said, I agree that manually updated pages are generally more useful and accurate than automated ones. It's just an extra tax on the incident team.
First, deciding what to put on them is PM hell. Is it customer-facing? Developer-facing? Both?
Then, for each item, how do you test so you can be really sure? Half the time tests will produce either false negatives or false positives. If your test writes data (like, say, posting a comment), is it a meaningful test of it's to a test or demo topic? But if it's to real topics, the test itself becomes user impacting.
Or you can use traffic stats to estimate whether things are going well, but then your traffic stats are question. Something may report zero traffic when things are great, or normal traffic when things are broken.
Then you have the classic "the tests are failing because the testing infra is broken, but production users are fine."
There just isn't a good answer to most of the problems status pages face. So, yeah, they suck, but I think there may be a Godel argument here that it is not possible for them to be timely, meaningful, and accurate.