I had checked that page not long ago, and as far as I remember there were many "red" or "orange" days in the past 3 months. Now it's all green. That's concerning
I clicked through wayback machine and couldn’t see any strong indicators that uptime had been rewritten, but there are a lot of snapshots. If you can prove it, I’m interested.
This has gotten to a point where it doesn't really matter anymore. When a service crosses a certain reliability threshold it's like a phase change. The customer base eventually adapts to the situation. Anyone who still genuinely cares has moved to self hosted enterprise or something else by now. It was most tenuous for me when they almost met the SLA. Now that they've blown so far beyond it, the stress is mostly gone.
I care but I can't move out because it's Orders from Up High which migrated us to Github. We haven't been here a year and it's not worth my neck to mirror our code in some kind of skunkworks forge instance. Woe. Woe is me.
Our Big Giant Enterprise wants to move all our repos from ADO to GitHub for all the nifty AI features, but I'm told the frequent downtime is a major issue so we're slow-walking the migration.
We have so many automated workflows and pipelines moving through Github Actions + other Github integrations it would be a giant headache to migrate. Not clear where we would go either. Gitlab??
The Google SRE book offers the following as one of the reasons to not gun for 100% reliability (emphasis added):
> users typically don’t notice the difference between high reliability and extreme reliability in a service, because the user experience is dominated by less reliable components like the cellular network or the device they are working with. Put simply, a user on a 99% reliable smartphone cannot tell the difference between 99.99% and 99.999% service reliability!
I've been on a shaky relationship with my ISP of late. What brought me to this thread today is that I couldn't push to Github. Notably this isn't covered by their downtime report so, going by the available facts, it's _probably_ not Github's fault I couldn't push; and I've just been on my daily stand-up call and I got disconnected so frequently.
But looking beyond today's available facts, odds are there's a bigger problem GH is not mentioning in their status page. They say the current incident has to do with "unauthorized users" and I wonder if pushing a commit from my IDE client counts as an operation from an "unauthorized user" as I still have to authorize with my SSH key.
It's just insane I can't decide which between Github or German o2 should be the more reliable service!
"unauthorized" is a bit different than "unauthenticated". The former suggests trying to access something you don't have permission for while the latter suggests you're just not logged in.
At a guess, I could imagine some sort of failure of cached pages, which can be cached for signed out users but probably not for signed in users (as the rendered HTML would need to have user context like their avatar etc)
> Following investigation, we are seeing that impact is limited to unauthenticated users when accessing Pull Requests or Issues. Our team continues to work towards mitigation with more updates to follow as we have them.
Digital systems don't necessarily deteriorate immediately after the causal factors. Like technical debt, issues grow unnoticed and become visible gradually.
People seem to miss entirely that this is not (only) some slop code that makes github go down, but its the fact that they get 100x the number of requests since AI tools came to the devs daily workflow.
In my employment from the last 3 years, I saw twice that there were two migrations from internal systems to GitHub. I would think that companies are doing this for cost-cutting measures. It's not like something I am going to research but it'd be interesting that their recent issues are related to large migrations from in-house installations to github and doubly, if that is related to how large companies might be tightening up their spending in the past few years.
Only for a scale approaching github's, otherwise a gitea instance or whatever doesn't have any interdependent components other than the server you host it on, which won't have nearly as low a downtime as github (though that's a low bar, a better way to phrase it would be saying it would pretty much never be down).
Monday is probably the best day for an outage? Everyone's in the office, so you're not disturbing them, and hung over, so you're not hurting their productivity much.
API Requests with 4 nines of availability??
Issues with 99.96 uptime?
PR with 99.61% uptime last 90 days??
https://mrshu.github.io/github-statuses/ marks PRs at 95.89% in the same period as an example.
> users typically don’t notice the difference between high reliability and extreme reliability in a service, because the user experience is dominated by less reliable components like the cellular network or the device they are working with. Put simply, a user on a 99% reliable smartphone cannot tell the difference between 99.99% and 99.999% service reliability!
I've been on a shaky relationship with my ISP of late. What brought me to this thread today is that I couldn't push to Github. Notably this isn't covered by their downtime report so, going by the available facts, it's _probably_ not Github's fault I couldn't push; and I've just been on my daily stand-up call and I got disconnected so frequently.
But looking beyond today's available facts, odds are there's a bigger problem GH is not mentioning in their status page. They say the current incident has to do with "unauthorized users" and I wonder if pushing a commit from my IDE client counts as an operation from an "unauthorized user" as I still have to authorize with my SSH key.
It's just insane I can't decide which between Github or German o2 should be the more reliable service!
I think there's 3 big themes with this, thought not
1. LLM tools have added considerable load.
2. LLM used by developers to increase velocity seem to be leading more outages. This calls into question the increased velocity.
3. Roadmaps focused on pushing features that aren't reliability problems. i.e. github moving to azure, or adding AI features.
All these same problems happen to orgs with other fads that aren't AI. Following fads is not good engineering.
At a guess, I could imagine some sort of failure of cached pages, which can be cached for signed out users but probably not for signed in users (as the rendered HTML would need to have user context like their avatar etc)
Sure they can. If Google loads and Github doesn't, then it's clearly Github being down, not the mobile network.
Also not everyone uses a phone. My desktop & fibre internet has way better than 99% reliability.
Honestly it's pretty mad to see, especially without a crisp failover.
I k ow for a fact that ANY other platform would fail faster than github if they had the same volume of http requests.
why i am keep seeing github down news in HN?
https://www.githubstatus.com/history
https://isgithubcooked.com/
Could not have happened on a worse day (Monday) and you can see how unreliable GitHub has been.
Better of self-hosting.
[0] https://news.ycombinator.com/item?id=48418183