Slack issues

m348e912 · on May 17, 2023

Slack is definitely down even though their status page says otherwise.

Incentives matter. If there is no penalty for misrepresenting the current state of an outage, you are going to find that company driven status pages are going to be frequently inaccurate.

zamnos · on May 17, 2023

In fact, there's an inverse incentive at play here. Slack is on the hook for missed SLA, so any admission of issue is legally and financially culpable. This means they don't want to admit to an issue until it's impossible to deny there is one.

Hamuko · on May 17, 2023

There is also an incentive to display the proper status, since if you show all green when the service is down, you look dumb.

We recently improved our status reporting because our old status system might not accurately indicate if only a part of our service went down, so it might show green even though (some) customers definitely saw very obvious downtime.

capableweb · on May 17, 2023

When the choice is between temporarily "looking dumb" and permanently loosing a bunch of money because of SLAs, companies like Salesforce usually make the latter choice.

ryanbrunner · on May 17, 2023

Usually this is less being outright deceptive (saying you're up when you know you're down), and more wrapping the definition of "down" in legalese so that nothing short of a complete system outage of every aspect of your service for 100% of users counts as "down".

In those cases status pages are more likely to have a lovely spectrum of yellows and oranges to choose from, but a distinct absence of red, but very few companies will say that they're fully green when they know there's issues. Green when you're down is usually more of a symptom of process problems in how outages get reported or resolved (no documentation, manual processes for updating, long communication chains from support teams to engineers).

inferiorhuman · on May 17, 2023

Nah the issue isn't looking dumb, the issue is looking like a liar. What good is an SLA if your reputation is that you're going to lie to avoid paying out?

rwaksmunski · on May 17, 2023

This is Salesforce we are talking about. They topped Stack Overflow as the most dreaded technology so they bribed SO not to include them anymore. They don't care if they look like liars to technical people. Technical people are not the decisions makers in their target customer base.

Volundr · on May 17, 2023

> They topped Stack Overflow as the most dreaded technology so they bribed SO not to include them anymore.

Wait what? And SO agreed? Where can I learn about this?

Hamuko · on May 17, 2023

I feel like customers treating you like a bunch of fucking idiots is not gonna be a great boost to your bottom line either. Maybe if you're in a market sector where reputation doesn't matter and you don't have any competition swooping in.

thecopy · on May 17, 2023

The people signing the contract with Slack is not looking at the status pages, generally.

bombcar · on May 17, 2023

The people signing contracts ONLY look at the status page, and say everything’s fine. They don’t bother verifying it signing in.

Timon3 · on May 17, 2023

> There is also an incentive to display the proper status, since if you show all green when the service is down, you look dumb.

How many customers will think you look dumb? If you admit downtime, will the accurate reporting drive away more new customers, than the inaccurate reporting drives away due to you looking dumb?

"Looking dumb" is not a reasonable factor. Damage to your public image is, but that's divorced from looking dumb. It's entirely possible Slack is saving money by not admitting their downtimes, because it leaves a better image.

wdr1 · on May 17, 2023

And yet, this generally doesn't happen.

Slack now has the outage listed:

https://status.slack.com/calendar

The online cynicism is something I've never understood. It assumes the world is filled first first-order simpletons & somehow only the poster rises above it.

People aren't dumb & companies aren't dumb. People will notice a status tracker isn't accurate. They will have reduced confidence in the company. Companies know people will notice & think this way, so they generally don't engage in this type of deception. I'm sure there are exceptions, but in general, status pages have tended to be accurate in my experience.

zamnos · on May 17, 2023

You can't measure "reduced confidence" like you can a dollars going out for not meeting an SLA, and what are you going to do in this case? Move to Microsoft Teams? Please.

Chris2048 · on May 17, 2023

Their status page is not a report of status, but rather a report of admission.

rozenmd · on May 17, 2023

We noticed about 2 minutes of hard downtime (API and website) on OnlineOrNot: https://slack.onlineornot.com/

01acheru · on May 17, 2023

Another status page that sucks. Slack goes down, people start texting me about it, status page is green, HN informs me that Slack is actually down.

Next time I build a status page it will simply be a static HTML with green indicators and some random metrics, looks like it is the industry standard. :facepalm:

sjsdaiuasgdia · on May 17, 2023

Progression of status pages, from experience at a large cloud provider...

Stage 1: Status is manually set. There may be various metrics around what requires an update, and there may be one or more layers of approval needed.

Problems: Delayed or missed updates. Customers complain that you're not being honest about outages.

Stage 2: Status is automatically set based on the outcome of some monitoring check or functional test.

Problems: Any issue with the system that performs the "up or not?" source of truth test can result in a status change regardless of whether an actual problem exists. "Override automatic status updates" becomes one of the first steps performed during incident response, turning this into "status is manually set, but with extra steps". Customers complain that you're not being honest about outages and latency still sucks.

Stage 3: Status is automatically set based on a consensus of results from tests run from multiple points scattered across the public internet.

Problems: You now have a network of remote nodes to maintain yourself or pay someone else to maintain. The more reliable you want this monitoring to be, the more you need to spend. The cost justification discussions in an enterprise get harder as that cost rises. Meanwhile, many customers continue to say you're not being honest because they can't tell the difference between a local issue and an actual outage. Some customers might notice better alignment between the status page and their experience, but they're content, so they have little motivation to reach out and thank you for the honesty.

Eventually, the monitoring service gets axed because we can just manually update the status page after all.

Stage 4: Status is manually set. There may be various metrics around what requires an update, and there may be one or more layers of approval needed.

Not saying this is a great outcome, but it is an outcome that is understandable given the parameters of the situation.

croes · on May 17, 2023

More like this

https://www.hasthelargehadroncolliderdestroyedtheworldyet.co...

viraptor · on May 17, 2023

There are two kinds of status pages. Those that show you red when the service is up (automated) and those that show you green when the service is down (manual and automated) - or some mix of those two. You don't want made up information too, so for the manual one you'll get some delay for the analysis of the situation.

Your actions are extremely unlikely to change if the downtime is for less than half an hour. So what exactly do you expect to happen here?

(Yes, I got annoyed at a thousandth comment that essentially says the status update is not instantaneous and perfectly reflecting the situation)

ilyt · on May 17, 2023

I expect:

* not lying

* not having to dig out our logs and proofs for SLA reasons on obvious fuckup.

> (Yes, I got annoyed at a thousandth comment that essentially says the status update is not instantaneous and perfectly reflecting the situation)

It's not about it being minute behind reality, it's about it lying in entirety. Why you have so much problems with understanding that ?

viraptor · on May 17, 2023

There's a bit of time between a delayed status report and lying. If you referred to them historically not reporting anything at all then that's fair, but it wasn't clear from the message.

And it's a common comment I see 10min after some service having issues: "the status page is still green" - yes, people are probably still logging in to things and figuring out if the issue actually is internal.

ilyt · on May 18, 2023

But the expectation is that site reports live state.

If it is up to person to change it then it as status page is useless, might as well look at reddit whether people are complaining...

efficax · on May 17, 2023

status pages are rarely automated and it looks like slack was down at like 1am pacific. somebody got woken up by a page and groggily escalated and they sat there fighting the outage for 30 minutes before someone said “what about the status page”. or at least that’s how it worked at my last company

stevewodil · on May 17, 2023

This is very accurate. Or the customer communication lead was shadowing and this was their first incident.

It’s all the same

switch007 · on May 17, 2023

If you accept that status pages are (partially) under the control of PR teams, it makes more sense that they’re useless and a lie.

nurettin · on May 17, 2023

Status page pull request is pending merge, because seniors are either fired or or overworked.

yosito · on May 17, 2023

Finally! A chance to get some work done, undisturbed!

rohith2506 · on May 17, 2023

And yet here we are commenting on HN ;)

LtWorf · on May 17, 2023

plugging my IRC gateway https://github.com/ltworf/localslackirc

I find that using slack from IRC, and having the option to deprive some people/channels to notify me is very helpful to reduce the amount of distractions.

aestetix · on May 17, 2023

Nice! How does this compare to wee-slack?

LtWorf · on May 17, 2023

Well most importantly it lets me use the IRC client I want to use (which isn't weechat), and I can install it via apt.

featurewise I really have no idea. I just use localslackirc.

floriangosse · on May 17, 2023

It seems to be back.

a012 · on May 17, 2023

The website and app are not accessible (HTTP 503) but their status page is still green :smile:

floriangosse · on May 17, 2023

Yeah, saw the same. Let's see how long it will take and when they update the status page.

antupis · on May 17, 2023

Well status pages are nowadays much worse indicator than HN or other sources because companies just don't update those.

mastazi · on May 17, 2023

Not down for me. OP's link takes me to Slack's homepage.

floriangosse · on May 17, 2023

I was down at the moment I've posted it. As I wrote in another comment – it's back available for me as well.

floriangosse · on May 17, 2023

For those who only rely on Slack's status page here are some other indicators (excl. HN):

* Twitter: https://twitter.com/search?q=%40SlackStatus

* Downdetector: https://downdetector.com/status/slack/

yla92 · on May 17, 2023

https://status.slack.com/ is not updated yet.

miyuru · on May 17, 2023

Its been 2 hours and the page still does not show any downtime. Whats the point of having a status page?

terom · on May 17, 2023

Digging into the status page history reveals a reference to an issue lasting for 7 minutes, caused by a "minor technical change" that was rolled back. [1]

Still claims 100% uptime for May.

https://status.slack.com/2023-05-17

spuz · on May 17, 2023

The post should probably link to the status page (which shows all systems ok at the moment): https://status.slack.com/

MollyRealized · on May 17, 2023

... or not.

petepete · on May 17, 2023

There was definitely a 20 minute window where it was down for us. Messages were occasionally getting through but often in the wrong order, and many just failed.

floriangosse · on May 17, 2023

Yeah, same for me. Most of the messages I tried to send failed and I had to retry sending them at a later point.

blitzar · on May 17, 2023

Working fine for me. I think you might be holding it wrong.

floriangosse · on May 17, 2023

I was down at the moment I've posted it. As I wrote in another comment – it's back available for me as well.

onion2k · on May 17, 2023

Based on the notifications I've been getting all morning, I can say for sure it isn't.

4dregress · on May 17, 2023

Yep, its down for me as well.

swah · on May 17, 2023

Working normally for us.

Traubenfuchs · on May 17, 2023

You are wrong, their status page clearly says

> Uptime for the current quarter:

> 100%

Rafi993 · on May 17, 2023

Weird I didn't experience any issues today

j4nek · on May 17, 2023

productivity for adhd people now increase :)

re-thc · on May 17, 2023

Slack is just slacking off.

codetrotter · on May 17, 2023

Works fine for me I think

ilrwbwrkhv · on May 17, 2023

slack is bitchware. a new term i use these days for software which is forced on employees because they have no other option (hence bitches). other examples of bitchware are microsoft teams, jira etc.

roofio · on May 17, 2023

Could it be that they actually serve a valuable purpose? If they didn't exist would you bitch about email being bitchware?

darkerside · on May 17, 2023

Would a decent status page be an actually intelligent use of blockchain? We could have a global network of computers responsible for determining a consensus on whether a service is truly and actually down or not. It could be captured in an independent ledger, and ideally used as a canonical determination for status pages, SLA disputes, etc.

CPLX · on May 17, 2023

The year is 2071.

They built a house of straw. The thundering machines sputtered and stopped. Their leaders talked and talked and talked. But nothing could stem the avalanche. Their world crumbled.

The cities exploded. A whirlwind of looting, a firestorm of fear. Men began to feed on men.

On the roads it was a white line nightmare. Only those mobile enough to scavenge, brutal enough to pillage would survive. The gangs took over the highways, ready to wage war for a tank of juice. And in this maelstrom of decay, ordinary men were battered and smashed.

Except for one man armed with an AK-47, and a Honda full of silver. As he stood on the bluff, looking down at the desert spread for miles before him, a man approached, darkly.

“Instead of silver bars, have you considered blockchain for this?”

ryanbrunner · on May 17, 2023

downdetector already does basically this effectively enough and is dramatically simpler in terms of technology. You don't really need to be overly complex about consensus for things like this - if a sufficiently large population reports something down, it's down, because the stakes aren't high enough for enough bad actors to be an issue.

Traubenfuchs · on May 17, 2023

You do not need a blockchain for this.

luplex · on May 17, 2023

I think a normal company would be better suited

stevefan1999 · on May 17, 2023

what if the blockchain itself was in an inconsistent state due to 51% attack (also known as split brain consensus)

ryanbrunner · on May 17, 2023

This is compounded by the fact that there's no objective truth about whether something is down (as evidenced by this thread, outages are often not global or not across the entire service)