GitHub incident: ongoing issues with Actions, Issues and other Git operations

idk1 · on May 9, 2023

I debugged for one min at 11:59 trying to push, and then my eat-lunch notification came in at 12:00 and I opened hackernews with a tuna sandwich and this is super helpful because it means I wont need to debug it locally for 10 mins before figuring out Github is down.

Edit - Just want to clarify when I say "opened hackernews with a tuna sandwich" I want to clear up that I did indeed full on mash the keyboard letters with my sandwich. It's costing me a fortune in keyboards every day and it's ruining my sandwich most days as well, I think I have an issue.

candiddevmike · on May 9, 2023

This is why I don't touch other people's keyboards.

brookst · on May 9, 2023

Yeah, this is a good argument for licking them instead.

capableweb · on May 9, 2023

Little bit of bacteria helps build up your immune system for when you really need it :)

Sohcahtoa82 · on May 9, 2023

You jest, but it's actually a legit hypothesis.

https://en.wikipedia.org/wiki/Hygiene_hypothesis

idk1 · on May 9, 2023

Well from another perspective, a bootstrapping perspective, and hackernews does like a bootstrapping perspective, I could make a case that a good meal could pick licked out of most keyboards, saving money on lunch once a week.

koolba · on May 9, 2023

Ideally both GitHub and HN would be down at the same time. With nowhere to discuss the former, people might actually get something done.

taf2 · on May 9, 2023

Easy to fix just adjust your system prompt add the following:

“Never ever type on keyboard with sandwich not even the most delicious tuna sandwich”

ht85 · on May 9, 2023

Next time try coffee, it's another worthy experience.

NikxDa · on May 9, 2023

The GitHub Status shows 14 incidents affecting Git Operations this year alone [1]. That's quite a lot, considering it's only May. I wonder if the outages were always this frequent and just get more publicity on here now, or whether there was a significant increase in outages fairly recently.

[1] https://www.githubstatus.com/history

capableweb · on May 9, 2023

Many outages happen because something changed, and someone/something missed one of the effects of said change, bringing the platform down immediately, or after a while.

There was a period of time when GitHub basically didn't change, for years. And the platform was relatively stable (although "unicorns" (downtime)) still happened from time to time.

But nowhere near as often as now, but then again, there is a lot of more moving pieces now compared to before.

dinvlad · on May 10, 2023

I wonder if it has anything to do with layoffs, if crucial Ops people left

vienzo · on May 9, 2023

Interested to hear whether anyone actually managed getting some Client Credits as per their SLA [1]? Over the last quarter they probably went sub 99.9% in some services.

[1] https://github.com/customer-terms/github-online-services-sla

wun0ne · on May 9, 2023

Does GitHub provide uptime stats? Seems very unreliable recently.

that_guy_iain · on May 9, 2023

About 10 years ago someone said we should move to self hosting because bitbucket who we used were unreliable. I looked at the status page and say 2 hours of downtime over 3-months, while we have 3-4 days of downtime on our self hosted jenkins during the same time. I always think of that when I see people complain about services being unreliable. Often we see one or two problems in short span and forget about the months were we didn't see any issues.

GitHub is probably as reliable now as it has been for the past 10 years. It's always had downtime.

darkwater · on May 9, 2023

> while we have 3-4 days of downtime on our self hosted jenkins during the same time

Wow. This sounds like a broken organization?

that_guy_iain · on May 9, 2023

Sounds like a startup with no devops and before a lot of the tooling to make things better.

darkwater · on May 10, 2023

And why such a startup should install a Jenkins server instead of relying on some SaaS free-tier?

that_guy_iain · on May 10, 2023

It was 10-years ago. There weren't really any SaaS with free tiers.

joshmanders · on May 9, 2023

It sounds like literally every organization.

manojlds · on May 9, 2023

No it isn't. It's been very unstable in recent times ( 1 year or so)

that_guy_iain · on May 9, 2023

I‘ve personally only experience 1 outage that effected me. And I’ve been using it heavily for the past year and moderately for years before that.

bobmaxup · on May 9, 2023

They used to, some screenshots of it existing here:

https://interface.lt/github-uptime-and-microsoft

capableweb · on May 9, 2023

> Let’s hope it’s temporary and GitHub error 500 won’t become their own version of Blue Screen of Death. In this case it would be Green Screen of Death (GSoD or GhSoD).

Heh, if anything it has gotten worse.

steelaz · on May 9, 2023

https://www.githubstatus.com/history seems to show incidents in the last 3 months:

* March - 20 incidents

* April - 12 incidents

* May - 4 incidents (so far)

smcleod · on May 9, 2023

We notice API outages that break core functionality every single day. It's gotten so bad over the last couple years.

__alexs · on May 9, 2023

Performance of the GitHub website in general feels increasingly bad.

meling · on May 9, 2023

Yes, my experience too. I feel it the most when reviewing code. It felt much snappier before.

grumple · on May 9, 2023

It's a surprisingly unreliable service. It's been great for code management / reviews. But I can't imagine relying on it as the only option for deployments via CD. Imagine needing to deploy an important bug fix or a big project with a release date, but you can't because Github's having an outage.

PCloud · on May 9, 2023

You can checkout https://github.com/GitHub-Incident-History/GitHub-Incident-H... which provides all recorded GitHub incidents.

voytec · on May 9, 2023

It's since M$FT took over, not so recently.

rvz · on May 9, 2023

Once again another GitHub incident and 4 days later before the last one [0], GitHub Actions goes down.

You are better off self-hosting at this point, rather than centralizing everything to GitHub [1] as it is just chronically unreliable for years ever since the Microsoft acquisition.

[0] https://news.ycombinator.com/item?id=35817998

[1] https://news.ycombinator.com/item?id=22867803

hnarn · on May 9, 2023

> You are better off self-hosting at this point

For medium/large companies, I fully agree. For smaller projects, specifically open source projects, I'd look at something like https://codeberg.org/

shibel · on May 9, 2023

Rumor has it Microsoft is pushing them to move their infrastructure to Azure, which explains much of the frequent downtimes lately.

p-o · on May 9, 2023

Maybe it's the case, but Github has always had issues with uptime and stability.

hnarn · on May 9, 2023

"Always"? I don't remember it being nearly as frequent before they were bought by Microsoft, some data to back this up would be nice.

maccard · on May 9, 2023

Where'd you get this from?

shibel · on May 9, 2023

1) A friend in DevOps who supposedly has contacts there.

2) It's not like this push is a big secret: https://www.cnbc.com/2022/10/12/microsoft-github-relying-mor...

KomoD · on May 9, 2023

Where is their infra right now?

candiddevmike · on May 9, 2023

Really looking forward to kicking the tires on Gitea next month.

dindresto · on May 9, 2023

We've switched to selfhosted Gitea last month, no regrets. Only the CI story could be a bit better. We're currently using Woodpecker but need macOS runners, and Woodpecker's "local" agent implementation is still unstable. I'm watching Gitea Actions' progress with great interest.

capableweb · on May 9, 2023

Did you try the macOS runners? Been using Linux, Windows and macOS runners without any issues for quite some time.

hnarn · on May 9, 2023

Any reason to use Gitea over the Forgejo fork? Based on the drama I'd rather use Forgejo if they're functionally equal.[1]

[1]: https://forgejo.org/faq/#why-was-forgejo-created

Vasniktel · on May 9, 2023

Great to be here. Next week same time?

xmdx · on May 9, 2023

Been off work for a month-ish. Everything is as it was I see.

hospitalJail · on May 9, 2023

I'm the lone person at my team that still believes in keeping most of our stuff local, with online versions as primarily backup.

Every time some global service goes down, or internal internet/intranet goes down, there is a security breach, or a WFH person has a power outage I'm reminded I'm right.

I'm no luddite, these services make you dependent on them. The worst thing I'm dependent on here is a bad computer. We have backups and keep our files on our network, so it seems fine. We are slowly moving to an online system, and I'm constantly reminded all the problems shifting online.

Meanwhile, if I had a linux server, we would be in control of our own destiny.

goodoldneon · on May 9, 2023

I don't know about you, but the stuff I self-manage usually has worse uptime than SaaS products

dijit · on May 9, 2023

Doesn't matter if your uptime is 80% as long as that 20% of downtime is happening when nobody is working

Additionally an 80% uptime architecture is really simple to maintain and restore and so on.

Complexity increases exponentially the more 9's you add.

hospitalJail · on May 9, 2023

Depends.

Having our programs offline mean I can run them, even if the internet isnt working.

Instead of getting 0 data from downtime, I can still get the data, run the programs, and give it to the person who needs it.

If we are fully online, if the servers are down, we basically lose the entire time.

Not to mention, I think 'uptime' is a pretty optimistic number, unusable slow service doesnt seem to hit any metrics I'm aware of.

sofixa · on May 9, 2023

Really depends on the "stuff". GitLab pretty much manages itself through their Helm and Omnibus installs.

solotronics · on May 9, 2023

GIT is actually a great protocol for keeping distributed copies of code. You can pretty easily with bash cycle through a list of backup urls for a git repo, looking for updates.

_flux · on May 9, 2023

Git itself is nice, but then there's the issue tracker and CI that are more difficult to setup.

papito · on May 9, 2023

Perhaps everyone should stop complaining and be thankful for a chill morning. You can't create a PR right now - go get a pastry and some fresh air. Be in the moment for once. It's beautiful outside*

* Where I am

Kelteseth · on May 9, 2023

Why don't we all collectively mirror our repos to gitlab and switch development to there during the monthly outage?

electroly · on May 9, 2023

GitHub outages aren't nearly long or often enough to consider this. Git is distributed, just keep working locally until GitHub is back up. GitHub outages are nowhere near the threshold of pain I'd require to introduce a second Git hosting provider to the mix.

Really, GitHub outages barely hurt at all. It's not like an AWS or Cloudflare outage which is more likely to be a production disaster. Every outage a bunch of people on HN start screaming about owning their own on-prem destiny or wondering why we're still on GitHub. Nothing changes because it's not nearly as bad as those people are making it out to be. Life is all about tradeoffs.

bombolo · on May 9, 2023

Depends how your company is set up. Some people can't run tests locally and just push commits to have some magic run the tests online.

geraldwhen · on May 9, 2023

Enterprise says I have to use this unreliable garbage.

lol768 · on May 9, 2023

It's definitely no less reliable than GitLab, where a good 300GB of database data got deleted in production by accident...

ishanjain28 · on May 9, 2023

And how often that has happened? Seems a little harsh

_kwef · on May 9, 2023

Just like calling GitHub "unreliable garbage"...

ishanjain28 · on May 9, 2023

Github has been down hundreds of times this year alone. They have reported outages 72 times this year and there are multiple times when services are unavailable and they don't report it on the status page.

I don't see how the two situations are comparable

lol768 · on May 11, 2023

> there are multiple times when services are unavailable and they don't report it on the status page.

There's no evidence that the exact same doesn't happen with GitLab. I've had it (consistently) 500 on me in the past when there's nothing on their status page to indicate any issues.

ishanjain28 · on May 11, 2023

Okay Agreed, Except

That's not the point of discussion. I didn't say Gitlab doesn't lie about it or heck, That it doesn't have worse uptime than Github.

My argument is that a company erasing 300GB production database once is not a stain on their competency and that it can not be compared to a company which has very frequent outages which also happens to lie when they have outages.

8organicbits · on May 9, 2023

gitlab.com or self-hosted?

hnarn · on May 9, 2023

gitlab.com is implied since it happening on a self-hosted instance would have nothing to do with gitlab as a service (they can't be responsible for your on-site backups).

> Trying to restore the replication process, an engineer proceeds to wipe the PostgreSQL database directory, errantly thinking they were doing so on the secondary. Unfortunately this process was executed on the primary instead. The engineer terminated the process a second or two after noticing their mistake, but at this point around 300 GB of data had already been removed.

https://about.gitlab.com/blog/2017/02/10/postmortem-of-datab...

8organicbits · on May 9, 2023

Ah I see the link. I'd caution that many people choose between github.com, gitlab.com, and gitlab self-hosted. The reliability of self-hosted gitlab is meaningful, especially when operated competently. People need to know if there are safeguards or foot guns. Backups alone can't prevent data loss.

namaria · on May 9, 2023

Substitute capital expenditure for operating expenses? With interest rates going up? It was already a tough sell with negative real rates...

bombolo · on May 9, 2023

Yep. We could self host. But it's forbidden.

geraldwhen · on May 9, 2023

We migrated recently. On prem was never down, but since moving to GitHub were more down than up.

misnome · on May 9, 2023

https://status.gitlab.com/ lists 27 incidents this year, so far.

alpaca128 · on May 9, 2023

https://www.githubstatus.com/history lists 72 incidents since January

misnome · on May 9, 2023

The point isn't that GitLab has more, the point is that running these things at global scale is pretty complicated, and everyone has problems. "Just switch to GitLab" is pithy but isn't in itself an actual solution.

tpxl · on May 9, 2023

You can self-host GitLab and have few, if any, incidents that get resolved very quickly. Worked for a company that had no incidents that I observed in ~3 years, now work at a company that had ~2 incidents in 1.5 years.

misnome · on May 9, 2023

We have a self-hosted Premium instance and have 30min of downtime _every day_ while the database is frozen and backed up. We've been told that it's a known issue being discussed with GitLab but that could just be CYA. But in any case, it's the "at scale, while changing" that tends to cause problems.

Perhaps this is a continuing argument for self-hosting, especially if you don't have to expose the instance publicly. But then, if that's an option, you can also self-host GitHub (though I have heard less anecdotes about the stability of that).

grumple · on May 9, 2023

> We have a self-hosted Premium instance and have 30min of downtime _every day_ while the database is frozen and backed up.

I'm confused. You can do zero-downtime backups and replication of databases. I don't know what your company / Gitlab are doing but it seems wrong.

bombolo · on May 9, 2023

And why aren't they doing it at 3am

count · on May 9, 2023

You can self-host GitHub Enterprise too.

Hamuko · on May 9, 2023

GitLab is quite a bit more expensive. If you have GitHub Enterprise with the security features, it's $70/month/user whereas you'll need to get GitLab Ultimate for the security features, which is $99/month/user.

progval · on May 9, 2023

Feature mismatch of anything outside Git. And no one wrote the tooling needed to synchronize issues, pull/merge requests, ... back and forth.

Lutger · on May 9, 2023

Even so, for a lot of devs its still easier as a temporary collaboration point than sending patches via mail.

jonas-w · on May 9, 2023

why particularly GitLab?

everfrustrated · on May 9, 2023

From the outside, it appears GitHub doesn't have any internal sharding going on. Outages always affect _all_ repos.

Architecturally this seems rather sub optimal?

EG AWS doesn't roll out changes globally - they start with an internal shard within a region and progressively roll out to more shards and more regions.

Why do GitHub not do the same?

kjuulh · on May 9, 2023

last I heard they've sharded some stuff, but some of the core tables/databases are still using a monolithic architecture.

And I guess you're only as strong as your weakest link, which can be not that bad, that is, if it isn't your core tables.

Though take this with a grain of salt, this is mostly hearsay =D

joennlae · on May 9, 2023

You will probably notice it also when trying to push a new branch:

`error: failed to push some refs to ` when using --set-upstream

jaitsu · on May 9, 2023

Indeed, I notice it when trying to push also:

  remote: Resolving deltas: 100% (3/3), completed with 3 local objects.
  remote: fatal error in commit_refs
  To github.com:acme/foo.git
   ! [remote rejected] HEAD -> acme/foo (failure)
  error: failed to push some refs to 'github.com:acme/foo.git'

indeyets · on May 9, 2023

"Actions, Issues" are not "Git operations".

would sound better without "other" in title

jaitsu · on May 9, 2023

good catch

michaelmure · on May 9, 2023

If that makes you mad, I still need help with https://github.com/MichaelMure/git-bug ;-) Coming at some point, kanban and pull-request support, offline-first!

RamblingCTO · on May 9, 2023

Hm, we had this last week as well. Guess who had a demo yesterday, needed to deploy things on Thursday/Friday and couldn't :(

capableweb · on May 9, 2023

If this is your first time being affected, I guess you have a good reason to adjust your deployment structure so you can deploy manually if needed.

If this is not the first time, hopefully at least it will be the last one :)

RamblingCTO · on May 10, 2023

No worries, we can. But I like to rely on automation to spend my time on other stuff. Worked out good in the end though ;)

hnarn · on May 9, 2023

Maybe a good time to investigate whether it's possible to hold those demos without relying on third party services.

RamblingCTO · on May 10, 2023

You're always relying on third parties. Always. Except if you run it locally. We're way beyond that. I deployed to production just fine. It's just a helper. It adds to the stress tho.

dclowd9901 · on May 10, 2023

Can anyone from GH weigh in on this? We've had several major outages from GH over the last month or two, and the company has been completely silent on the causes, as well as any sort of remediation steps to fix stability.

As a somewhat large size org, we're now exploring other options for code hosting.

JestUM · on May 9, 2023

Was unable to merge PRs.

Earlier, I also got GitHub PR comment emails about 6 hours late.

Whatever it is, it’s been happening for more than 6 hours.

toastal · on May 9, 2023

Could you apply the patch from the URL? Pull requests aren't really needed.

meindnoch · on May 9, 2023

Thankfully git is a distributed version control system, so such outages are not of major concern.

capableweb · on May 9, 2023

Ignoring the fact that what people actually do with GitHub, git is such a small part. Issues, PRs, CI/CD and basically everything that isn't git, doesn't happen over git (besides the wiki, which somehow miraculously actually is via git).

Some people have their entire roadmap in GitHub, and every single bug report / feature request, without any offline backup. Don't ask me why, I don't get it. Especially since they have proven for the last few years that they cannot keep the platform up in a stable manner.

_eojb · on May 9, 2023

I mean, you joke but that's actually fairly true. P4 users always notice when the central server goes down because you can't reliably look at changelist history, draft CLs, and do a host of other operations that are possible on git locally. (using a central VCS confers other advantages of course).

smcleod · on May 9, 2023

You're kidding right? People / companies rely on git being available for deployments, builds, config changes, developer workflow etc

meindnoch · on May 9, 2023

But "git" is available. It's most likely under /usr/bin.

smcleod · on May 9, 2023

It's not git people are paying for.

ta988 · on May 9, 2023

Youa are confusing git and github.

smcleod · on May 9, 2023

No I'm not. This outage affects Github, not git itself - but if you're storing your git repos (and automation) on Github then you cannot git clone, push etc... from or to them - all of which are critical to CI/CD.

voytec · on May 9, 2023

They are adding affected services to the status entry title (started with Issues, Actions, Operations). Can't even do a simple push due to this so-called "degraded performance".

kwyjibo_hunter · on May 9, 2023

They’ve convinced me to shut off the GitHub status alerts I get on slack (or at least move them to their own channel that I can ignore).

jelling · on May 9, 2023

CivitAI.com is down and Hugging Face is having 502s for some operations. Perhaps there is a larger issue?

scottmf · on May 9, 2023

Wow huggingface is down completely

pavo-etc · on May 9, 2023

Wondered why I couldn't push my notes, appears that pushing commits is also impacted.

arkitaip · on May 9, 2023

Goddamn and here I thought I was getting errors because the files i pushed were to large or something.

yuuta · on May 9, 2023

Ever since Microsoft acquisition

perryizgr8 · on May 9, 2023

This is a monthly incident at this point. Maybe they have a rogue cron job??

jonas-w · on May 9, 2023

more like weekly when you look at the past incidents https://www.githubstatus.com/#past-incidents

vb-8448 · on May 9, 2023

i wonder if all these incidents are related to azure devops integration

KomoD · on May 9, 2023

This happens constantly, what is the Github team doing?

voynich · on May 9, 2023

I've started working on a Forgejo instance for myself (Gitea fork). It's honestly disappointing how bad GitHub has gotten, just in terms of uptime anymore. I hope they get their stuff together.

theobr · on May 9, 2023

I haven't been able to push for a bit now

justinclift · on May 9, 2023

"We gave ChatGPT root access to our infrastructure servers, and unexpectedly it crashed everything."

That's almost believable at this point. ;)

Vasniktel · on May 9, 2023

2 hours going strong!

yurishimo · on May 9, 2023

Having issues with `git push` from The Netherlands. :(

linhns · on May 9, 2023

Cannot create a new repo

vermilingua · on May 9, 2023

Now pages too

malka · on May 9, 2023

Again?

rvz · on May 9, 2023

Yes. Once again. [0]

Due to GitHub's chronic unreliability, it is guaranteed to continue happening every month.

Looks like avoiding to 'centralize everything to GitHub' has aged very well [1] and at this point you would get better uptime with self-hosting instead of using GitHub.

Just ask many open source organizations like RedoxOS, ReactOS, wireguard, GNOME, KDE, etc.

[0] https://news.ycombinator.com/item?id=35817662

[1] https://news.ycombinator.com/item?id=22867803