Network Performance Issues in multiple locations

jgrahamc · on May 2, 2017

Telia had problems hitting many service providers. Not a Cloudflare problem per se. Plenty of non-Cloudflare stuff affected like Reddit, AWS, Fastly, ...

Check out the dip in requests to Reddit: http://www.redditstatus.com/

gtaylor · on May 2, 2017

Yes, we were affected by this: https://status.fastly.com/incidents/3j0pnly3gvqb

Fastly was on top of it and routed around the issue quickly.

goshx · on May 2, 2017

Did Reddit stop using Cloudflare?

MeltedLux · on May 2, 2017

They switched from Cloudflare to Fastly a few months back.

profmonocle · on May 2, 2017

Reddit moved to Fastly at some point.

goshx · on May 2, 2017

That dip looks normal for that time of the day, btw. Click on the "Week" tab.

jgrahamc · on May 2, 2017

That's not the dip I'm talking about. Look again.

goshx · on May 2, 2017

Ok, I see it. Same time as the error rates went high.

stoolpigeon · on May 2, 2017

I can get to subreddits - but not reddit.com for the last hour or so.

ComputerGuru · on May 2, 2017

Obligatory comment that I (and others) make every single time: can we please for <insert diety name>'s sake stop centralizing everything? We are literally throwing away all the benefits of a mostly-decentralized internet for the sake convenience.

phailhaus · on May 2, 2017

Do you have a viable alternative to protecting small websites from DDOS attacks?

cremp · on May 2, 2017

You have to realize that DDoS mitigaters are in a position to not stop attacks. They get paid more money when attacks happen; so any company whose sole purpose is mitigation, has a major conflict of interest. A small site can easily be hosted on AWS, which has their own protection which is transparent. Any other cloud provider should offer it transparently anyway.

I absolutely hate people who claim Cloudflare is their only solution for mitigation/protection, because it simply isn't true, and Cloudfare does some rather shady stuff.

kossae · on May 2, 2017

I feel like saying DDoS mitigators are in a position to not stop attacks is akin to saying car insurance companies are in a position to not stop car accidents. I think the value prop is the quality of the service WHEN the attacks happen, and when they aren't happening it is effectively an insurance-like business. However if I get DDoS'd and my mitigator does nothing, one would think they would eventually be overtaken by a more competent competitor.

cremp · on May 2, 2017

Your analogy is accurate, but... If you don't have a mitigator, they have incentive to force you on one; if you are already on it, their incentive is throttling, or otherwise 'attacking' (loosely defined) your source.

With car insurance, the insurance company has incentive to mitigate their risk, (they don't want to shell out more than they need to,) charging more if you are higher risk. They don't want to take more risk than they have to. Key point, they evaluate risk on a case by case basis.

DDoS mitigators however, they already have invested in the risk by getting the hardware to handle the bandwidth. They don't care if you are attacked or not. Nothing then stops them from playing dirty. This kind of stuff frequently happened with Minecraft servers (what feels like) ages ago. Mitigating services would go out and attack servers, and competitors to get customers to switch to them.

WhiteSource1 · on May 4, 2017

A good DDoS mitigation service can take the brunt of the attack and so you stay online. So, it's not exactly like car insurance companies unless insurance companies actually were able to put a steel wall in front of your car to prevent accidents.

But you still get attacked, but it's like a frame around you, so you don't hurt or damaged. (Here's an example of how Incapsula mitigates DDoS attacks - https://www.incapsula.com/ddos/ddos-mitigation-services.html)

marcosdumay · on May 2, 2017

> I feel like saying DDoS mitigators are in a position to not stop attacks is akin to saying car insurance companies are in a position to not stop car accidents.

Only if there's no overcharge when an attack happens. If there is, you are in the conflict of interest situation the GP was talking about.

I don't know what is CloudFare billion policy.

AstralStorm · on May 2, 2017

Freenet style distributed cache and name resolution. But that would require an Internet v3.

noplay · on May 2, 2017

It's not a cloudflare issue. They report an incident on a major transit provider. This impact a log of site not just cloudflare.

ddevault · on May 2, 2017

Idea: let's proxy half the internet through a private, proprietary service! We can get people to give us valid SSL certificates for their sites, too, and let's fuck up Tor while we're at it. We can totally handle it, right? Oh, and we need to pay for it somehow, so let's go the venture capital approach and just pretend we won't eventually hit a growth cap and ruin our company Twitter-style when we get there.

The crazy thing is people actually bought it.

Afforess · on May 2, 2017

I love a good pitchfork and torch session as much as the next rabble-rouser, but let's remember Cloudflare got popular because they _solved*_ a hard problem: how to deal with a DDOS, as a small or medium size website. Cloudflare's essentially a for-profit insurance pool for bandwidth. No individual site has enough bandwidth to handle a DDOS, nor can afford it, but pooled together, many sites can afford a service that can handle individual DDOS attacks. Even if you want to solve the problem without a profit-motive, you still end up with a solution that is going to look very similar to Cloudflare.

eeeeeeeeeeeee · on May 2, 2017

Exactly. You don't solve a DDoS problem by having less capacity than your attacker and most individual companies can never afford the amount of bandwidth that is at Cloudflare's disposal.

Centralization was absolutely the best answer to that problem and will be for a long time. Almost nobody but fortune 500 companies would be able to survive a DDoS otherwise.

pyvpx · on May 2, 2017

Akamai has had and continues to have more capacity than Cloudflare.

snowwrestler · on May 2, 2017

Ok but they:

1) are way more expensive than Cloudflare

2) suck at DDOS mitigation (there's a lot more to it than just bandwidth)

3) don't care much about DDOS mitigation (it is a side business from their actual business, which is edge caching)

4) drop customers who actually get hit with big DDOS attacks (see #3 above--they will always prioritize caching customers over DDOS customers)

EDIT- oh and I forgot to say that if your site is HTTPS, you will have to give Akamai your keys, just like you do with Cloudflare.

pieter1976 · on May 2, 2017

For CDN, yes. For DDoS mitigation, no. Before they removed the figure from their website they claimed about 1/5th the DDoS mitigation capacity as Cloudflare.

gtaylor · on May 2, 2017

That doesn't matter at all. They've still got tons of capacity.

> You don't solve a DDoS problem by having less capacity than your attacker and most individual companies can never afford the amount of bandwidth that is at Cloudflare's disposal.

You can have less capacity than Akamai (many excellent providers have less) and still serve this purpose.

eeeeeeeeeeeee · on May 2, 2017

They do, but they are considerably more expensive to the point they aren't even competing with Cloudflare.

jankedeen · on May 2, 2017

Don't know that they have more but most colos I've used are using akamai.

jwineman · on May 2, 2017

Source?

ddevault · on May 2, 2017

>solved

"worked around" is more appropriate, and introduced huge problems with their workaround.

The correct solution is to punish ISPs that permit this behavior to continue unchecked. We need offense, not defense. Any ISP that doesn't detect and kill DDoS participants needs to be severely throttled by other ISPs. Organizations like the FCC should be tackling this and levying fines against US-based ISPs for non-compliance and lobbying for foreign policies that punish foreign ISPs.

nojvek · on May 2, 2017

It's really hard to know what constitutes DDOS traffic at times. Suppose a Netflix show got really popular, do you cut it off. Let's make an exception for Netflix. What if a new competitor blahflix got popular quickly, Does its traffic get blocked?

Oh wait now blahflix needs to pay $$$ to get special privileges. Shit gets hairy real quick.

Suppose DDOS happens from iot devices. One of this is an important medical device that got hacked. Do you auto shut it down and block it's traffic. What about the life critical device under same IP through NAT that is secure also getting blocked?

ISPs should remain dumb pipes. You really don't want to give comcast more power.

ddevault · on May 2, 2017

>It's really hard to know what constitutes DDOS traffic at times. Suppose a Netflix show got really popular, do you cut it off. Let's make an exception for Netflix. What if a new competitor blahflix got popular quickly, Does its traffic get blocked?

Well, presumably companies have arrangements with their ISPs for expected usage and such. There can be a grace period as well, when you hit up the user and say "hey, you're using a lot of bw, is all well?" You also combine this with abuse reports from the victims if a DDoS is in fact underway. I don't think it's bad for an ISP to establish trust with a customer, either, this already happens with things like DMCA requests.

>Suppose DDOS happens from iot devices. One of this is an important medical device that got hacked. Do you auto shut it down and block it's traffic. What about the life critical device under same IP through NAT that is secure also getting blocked?

Life critical devices aren't exposed to the internet. IoT users should get throttled and receive a comminication from their ISP telling them they have a malicious device on their network with advice on how to fix the problem.

6d6b73 · on May 2, 2017

"One of this is an important medical device that got hacked"

If someone puts "an important medical device" on a network directly accessible from the internet, or on the same network as other IOT crap devices, they should be banned from ever working with computers.

richardthered · on May 2, 2017

Elon Musk is working on direct brain interfaces with computers. Soon, they'll be able to hack your brain!

marcosdumay · on May 2, 2017

You punish origin address forgery. That's enough.

If you are under attack and nobody is forging their origin, it's only a matter of you talking with your ISP to block the offenders.

zeta0134 · on May 2, 2017

This is a noble ideal that will never actually fly in practice. I can protect my site against DDoS by correcting architecture issues with one small set of companies: the hosting providers that my site sits behind, and the computers and architecture that make my solution work.

You're proposing that I protect my site by rewriting the rules for internet across the entire planet and punishing every single visitor (thousands upon thousands!!) who doesn't play by some new arbitrary rules that we then have to get everyone to agree on.

No, the ISPs should not be made to correct this kind of behavior, because it will be an eternal game of cat and mouse, and we've proven that the attackers can get around said blocks quite easily. Heck, often the "attackers" are grandma and grandpa types that clicked on a bad link and didn't know any better. Instead, we're taking the right approach here: identify bad incoming traffic at the destination, and drop it before it hits the backing servers. That's a solution we can actually reasonably apply.

I don't agree with a lot of what Cloud Flare is doing, and I really wish we had more than one service like it that was as popular as they are, but they are doing good work. They're solving a huge need within the industry. I believe there should be more competition in the space, but I refuse to believe that the overall approach is inherently bad when it obviously works.

Afforess · on May 2, 2017

The solution can't be to hurt innocent traffic because a malicious user has some of the bandwidth. I agree some ISPs are complicit in the situation, but tit-for-tat approaches ultimately will hobble ISPs and create an irate and a distrustful internet. Even if you do create a magical technical solution that solves all the challenges without hurting bystanders, then you face an even bigger challenge: the status quo.

There is no way to get from our current situation to the world you propose - there will be never be a quorum from ISPs (or governments) on this sort of standard. It's a tragedy of the commons and no single participant has enough leverage or interest in a new status quo.

anc84 · on May 2, 2017

The insurance analogy would only work if they commonly had divisions of people who wrecked peoples' stuff.

eeeeeeeeeeeee · on May 2, 2017

As someone who has had their company targeted by persistent DDoS / ransoms and unable to afford protection from anyone else but Cloudflare, your statement is completely ignoring the enormous value that they provide and only focusing on the negative.

Centralization comes with risks, but I think the risk is absolutely acceptable in this case until we come up with a better, decentralized approach to attacks that normal people can afford.

lepouet · on May 2, 2017

I have a medium sized website, for 20$ a month they take 80% of the requests and bandwidth so i don't have to pay for a dedicated server and someone to run it.

I love them very much.

If they fuck up too often i can stop using them in 5 minutes, this is just perfect.

ryanlol · on May 2, 2017

While I entirely agree with everything you said, it's worth pointing out that Cloudflare isn't really at fault for this particular outage.

ddevault · on May 2, 2017

Everyone is to blame here. Equal parts Cloudflare and everyone who uses their service.

pyvpx · on May 2, 2017

Telia ate it, not CF.

Telia eats it quite regularly, but not often at this magnitude.

peferron · on May 2, 2017

Cloudflare has a freemium model, so maybe you should find a better comparison than Twitter.

ddevault · on May 2, 2017

They both have revenue, just from different sources. The comparison isn't that bad. All VC-backed approaches (and public companies) eventually hit a growth cap, when their investors are going to expect unsustainable growth.

peferron · on May 2, 2017

Companies with unclear or ad-based business models, like Twitter, are more likely to end up doing sketchy things to stay in business. Of course you can find exceptions either way but I think it generally applies.

That being said, I agree that centralization sucks, and I'm thinking about symbolically moving my tiny blog off Cloudflare for this reason. The ridiculous thing is that the origin is on GitHub Pages, so I'll have to move off there as well to be coherent.

rnhmjoj · on May 2, 2017

"Make the Internet work the way it should".

mef · on May 2, 2017

pingdom also reports "something is wrong on the internet" https://status.pingdom.com/

nannal · on May 2, 2017

Bright yellow on white background?

ShakataGaNai · on May 2, 2017

Anyone have more concrete information on where the routing issues are and/or who's affected?

Thaxll · on May 2, 2017

Lot of routing in Europe is done through Telia.

VA3FXP · on May 2, 2017

It looks like Telia is apparently the issue.

Our CDN's are having problems all over the place. No indications of what shit the bed, but this is more then Cloudflare

rckrd · on May 2, 2017

Why is this on the front page? It seems like whatever happened had nothing to do with cloudflare. Can a mod remove this, as a status page is not news?

bdcravens · on May 2, 2017

I'm inclined to agree, but if it's being voted up it's "news". It's as much news as "Github Down" when the submitted link times out.

infogulch · on May 2, 2017

I like to think of Cloudflare like insurance. Any single website may need it rarely if ever, but if it happens to you, you have little to no recourse that doesn't involve large sums of money.

Instead, you pay Cloudflare a regular, small amount of money† to reduce the risk of having to pay a large sum of money in case you're targeted. This sounds almost exactly like insurance to me.

† Sometimes the marginal cost is actually $0!

DocG · on May 2, 2017

More general question, as similar situation has happened multiple times where we are not sure:

1) did we break our client server

2) did our internet provider die

3) did the service die

What are recommended ways of finding out fast and reliably in these cases where the fault is.

zhan_eg · on May 2, 2017

Some of my experience and solution to those issues

1) UptimeRobot [0] - use to monitor various client websites. The free plan checks every 5 minutes, which should be enough. Notifications can be sent to email, slack, sms and many others. If you think there may be a problem only from some locations make a fast check with [1]. If you suspect DNS issues [2] or [3].

2) Again use UptimeRobot for monitoring device publicly accessible from your network. Moreover, if you are in control of your office network, using pfSense [4] notifications when a network gateway goes down works well (still, that works only if you have 2 or more ISPs). Or use a dedicated monitoring device/service like Zabbix.

3) Using to Twitter to Slack notification, subscribe for updates from both services that you use and major services responsible for Internet backbone. An example is, that using GitLab, comes with multiple time when the service dies (even that they are improving) - seeing the message in Slack that something is WIP currently by all team members (in a dedicated channel), helps to skip unnecessary debugging [5] :)

Not affiliate with any of the service. Still - met the UptimeRobot guys some ago - they are a small startup based in Malta, are very cool and have very stable service :)

[0] https://uptimerobot.com/

[1] http://www.super-ping.com/

[2] https://www.whatsmydns.net/

[3] https://dnschecker.org/

[4] https://doc.pfsense.org/index.php/Gateway_Settings#Gateway_S...

[5] https://twitter.com/gitlabstatus

corobo · on May 2, 2017

Pro-active monitoring rather than reactive diagnosis.

Zabbix is an example piece of software, probably overkill for most but I haven't used anything else in the last 5 or so years so don't have any better suggestions.

jankedeen · on May 2, 2017

External monitoring from AWS or a colo. Simple icmp checks and tcp connects + possible up to app layer checks allowed from these failsafes. Obfuscate as needed.

pyvpx · on May 2, 2017

uhh, monitoring those things? you can monitor your internet provider a number of ways. something like smokeping allows for an exceedingly simplistic test: ping stuff on the internet.

logs will tell you if a client server or individual service has died. that are literally hundreds of solutions for these.

iou · on May 2, 2017

They said they were scheduling some dashboard maintainence for today. Perhaps this had some unforseen side effect?

jgrahamc · on May 2, 2017

This was not a Cloudflare-specific problem (see my comment above). Maintenance was yesterday not today and was completed.

uwu · on May 2, 2017

the title made me think of cloudflare watch (http://www.crimeflare.com/)