Please don't share our links on Mastodon

paxys · on May 1, 2024

Ironic that their "please don't share our links" post shared on HN also caused their website to crash. It's a 100% static blog post. Use a damn cache, or CDN, or a hundred other ways of handling ~unlimited traffic for free. We are talking about a few thousand hits and tens of megabytes of total data transfer. It isn't 1998. This scale should not be a problem for anyone.

oefrha · on May 1, 2024

Archive link here: https://archive.is/ikgco

I’m entirely baffled why someone savvy enough to produce monitoring graphs and claim to already use Cloudflare can be brought to knees by HN (maybe Reddit too or something?) traffic on a frigging blog. I believe you need to actively sabotage your Cloudflare settings to achieve this.

tazjin · on May 1, 2024

Cloudflare probably respects your cache headers (I haven't used it, but would expect so). This website disables caching in its headers, so Cloudflare can't actually do much.

BrandoElFollito · on May 2, 2024

Yes, they should disable caching on index.html only and have the other files update their names on changes

acdha · on May 2, 2024

They should allow everything to be cached and use the Cloudflare API to invalidate the cache when they actually change something. That means that even huge spikes will have no impact on their backend and they’ll still be able to see changes within a second or two of publishing them.

This complaint is simply an attempt to shift the blame for their decision to ignore multiple decades of prior art. Even in the 90s we used techniques like caching to avoid this problem, and their page-load times now somehow manage to be worse.

BrandoElFollito · on May 2, 2024

> Cloudflare API to invalidate the cache

This is even better (I do not know Cloudflare that much, but there is indeed the same on AWS' CloudFront)

jefftk · on May 1, 2024

There are two problems here:

1. The "It's Foss" site is designed somewhat carelessly, where many things that could be static aren't, and so it goes under with even a bit of load.

2. Mastodon link preview is badly designed and not spec-compliant. Because the link previews are not triggered by a user request they should respect robots.txt (https://github.com/mastodon/mastodon/issues/21738), or they should start being triggered by a user request (https://github.com/mastodon/mastodon/issues/23662).

johnea · on May 1, 2024

Could mastedon federate the link preview as well as the link?

Some OP advice to "get a damn CDN" seems to be a responce rather than a correction of the issue.

Even poorly designed websites shouldn't face DDoS scale access by a social network sharing a link. The social network should mitigate it's massive parallelization.

NotEvil · on May 1, 2024

The idea has been discussed. The main problem is that Mastodon is a zero-trust environment, and we can't trust the other server to send us the correct preview.

abeyer · on May 2, 2024

Don't trust the server, trust the poster/client... that's really where the preview should be coming from anyway. And if you _don't_ trust the poster, you shouldn't be trusting anything they link you to anyway. It's trivial to lie to these preview generators with a couple of meta tags if the destination isn't trustworthy.

ASalazarMX · on May 3, 2024

> Could mastedon federate the link preview as well as the link?

Off-topic, but I love how "mastedon" sounds.

joshspankit · on May 2, 2024

Maybe it’s time for a new web spec: something that lets federated platforms request link previews with minimal resources. Realistically you need max two requests (one for the structured text, one for the preview media) and those can be heavily cached.

willcipriano · on May 1, 2024

The user submitting the link is requesting it if you want to play spec lawyer.

jefftk · on May 1, 2024

The way it currently works is:

1. You make a post that includes example.com/cats

2. Your post is federated to the instances your followers are on.

3. Those instances fetch example.com/cats in a fully automated way, and observe its preview image is example.com/tabby.jpg and fetch that as well. They save this locally.

4. When your followers view your federated post they see a short text extract from example.com/cats and a thumbnail of example.com/tabby.jpg

Step 3 is where the 'DDOS' is happening, and it's not a request issued by an agent of the original poster.

willcipriano · on May 1, 2024

A single user action can result in more than one http request to a service. The idea of the rule you are referring to is more about things that don't have any user input at all, like crawlers. A user asked for that, they said "I want all my pals to know about this page", even if it's a dumb thing that they asked for, they asked for it.

jefftk · on May 1, 2024

In step 3 the servers asking for the image are acting as agents of the followers, not of the original poster, no?

willcipriano · on May 1, 2024

Why did the original poster post it if not to have the image downloaded to be displayed in his friends feeds?

jefftk · on May 1, 2024

The image isn't actually the issue: it's the html for the web page. The image is static and quick to serve, the html is often dynamically generated for each visitor.

Thinking about other similar distributed systems, if I put a link to an html page in an RSS feed or a mailing list message, it's not normal for subscribers to be using clients that fetch linked pages. But if they did it's clearly as an agent of that user, and not as an agent of the poster?

willcipriano · on May 1, 2024

If I was the web service getting hit I'd be glad that somebody is even interested enough that I'm having this problem. I also have the wisdom to know complaining is unlikely to solve my issue.

I get paid to basically solve this sort of scaling problems so I'm not a great barometer but it isn't that hard to handle a traffic spike like this.

ehutch79 · on May 1, 2024

15k people, hitting a site that isn't actively changing content, shouldn't be bringing any webserver to it's knees.

- They've set the generated page to be uncachable with max-age=0 (https://developers.cloudflare.com/cache/concepts/default-cac...)

- nginx is clearly not caching dynamic resources (currently 22s (not ms) to respond)

- lots of 3rd party assets loading. Why are you loading stripe before I'm giving you money?

- why is there a random 'grey.webp' loading from this michelmeyer.whatever domain?

This isn't mastadon or cloudflare, it's a skill issue.

dredmorbius · on May 1, 2024

It's not the 15k followers of ItsFOSS who are generating traffic, it is the servers of those 15k accounts own followers who see shares, boosts, or re-shares of content. Given that Mastodon, and the larger Fediverse of which it itself is only a part (though a large share of total activity), are, well, federated (as TFA notes pointedly), sharing links requires each instance to make a request.

I'm not particularly clear on Fediverse and Mastodon internals, but my understanding is that an image preview request is only generated once on a per server basis, regardless of how many local members see that link. But, despite some technical work at this, I don't believe there's yet a widely-implemented way of caching and forwarding such previews (which raises its own issues for authenticity and possible hostile manipulation) amongst instances. (There's a long history of caching proxy systems, with Squid being among the best known and most venerable.) Otherwise, I understand that preview requests are now staggered and triggered on demand (when toots are viewed rather than when created) which should mitigate some, but not all, of the issue.

The phenomenon is known as a Mastodon Stampede, analagous to what was once called the Slashdot Effect.

There's at least one open github issue, #4486, dating to 2017:

<https://github.com/mastodon/mastodon/issues/4486>

Some discussion from 2022:

<https://www.netscout.com/blog/mastodon-stampede>

And jwz, whose love of HN knows no bounds, discusses it as well. Raw text link for the usual reasons, copy & paste to view without his usual love image (see: <https://news.ycombinator.com/item?id=13342590>).

   https://www.jwz.org/blog/2022/11/mastodon-stampede/

yupyup54133 · on May 1, 2024

Someone asked them in their comments section:

> Have you considered switching to a Static Site Generator? Write your posts in Markdown, push the change to your Git repo, have GitHub/GitLab automatically republish your website upon push, and end up with a very cacheable website that can be served from any simple Nginx/Apache. In theory this scales a lot better than a CMS-driven website.

Admins Response:

> That would be too much of a hassle. A proper CMS allows us to focus on writing.

kevincox · on May 1, 2024

Switching to a different backend seems overly dramatic. But configuring some caching seems like a pretty obvious improvement.

em-bee · on May 1, 2024

and i agree with them. static site generators are nice for people who like that kind of workflow, but when you have a team with different personal preferences you need something that is easy to access and doesn't require much learning to get used to.

what would be ideal is a CMS that can separate content editing and serving it. iaw a kind of static site generator that is built into a CMS and can push updates to the static site as they happen.

ehutch79 · on May 1, 2024

A quick google search turns up several tools to generate static sites from a ghost instance.

mingus88 · on May 1, 2024

It’s been a long time since I’ve been in the CMS space but I thought Wordpress had dozens of plugins for caching even 10 years ago.

I mean, even just hopping over to host your site in Wordpress.com was a viable option if you were in that middle ground between personal blog and having a dedicated server admin to handle your traffic

Hard to believe that you’d be in the business of serving content in 2024 and have to deal with the slashdot effect from 1999 for your blog of articles and images.

joshspankit · on May 2, 2024

WordPress does have bunches of caching plugins. Some VPS hosts (shout out SiteGround) also have multiple layers of server-level caching that they will apply automatically.

You can take most WordPress websites from multi-second load times to 750ms or less (in fact as a regular exercise I set up fresh WordPress installs on dirt-cheap VPS hosts and see how low I can get them while still having a good site. 250ms to display is not uncommon even without CDNs)

em-bee · on May 1, 2024

true, but i suspect that wanting to self-host the content is a factor (at least it would be for me). and as others have mentioned, there seem to be issues with the cloudflare setup that should have helped here too, so even with self-hosting, it should be possible to handle this.

alavry · on May 1, 2024

This is already possible using a headless CMS

em-bee · on May 1, 2024

can you elaborate on this? my understanding was that headless CMS means that it has no frontend at all, and you build your own using whatever web dev tools you like. iaw it is used to integrate CMS functionality into my website without hosting the full website inside the CMS.

rsynnott · on May 1, 2024

It shouldn't even be 15k people, unless every person is on a different instance, which is wildly unlikely.

SunlitCat · on May 1, 2024

Maybe it's because it's deemed "too difficult" to change it?

Years ago, I was looking into some very popular c++ library and wanted to download the archive (a tar.gz or .zip, can't remember). At that time, they hosted it on sourceforge for download.

I was looking for a checksum (md5, sha1 or sha256) and found a mail in their mailing list archive where someone asked for providing said checksums on their page.

The answer? It's too complicated to put creating checksums into the process and sourceforge is safe enough. (paraphrased, but that was the gist of the answer)

That said, since quite some years they provide checksums for their source archives, but they kinda lost me with that answers years ago.

wlonkly · on May 2, 2024

> - why is there a random 'grey.webp' loading from this michelmeyer.whatever domain?

This got me wondering, and the reason is that they embed a "card" for a link to a similar blog post on michaelmeyer.com (and grey.webp is the imagine in the card). There's a little irony there, I think.

jpifvuk0 · on May 1, 2024

Just as a reminder: the site sees plenty of traffic from various other platforms. It's quite popular, I'm sure, Mastodon is one of the least of their concerns for traffic. If they can handle traffic load on many other various viral/popular traffic, the server is capable enough (even without proper caching).

Not every site configuration is perfect, and blaming the site's configuration, and ignoring Mastodon's inherent issue is borderline not practical.

ehutch79 · on May 1, 2024

They went straight for blaming mastodon, without seeming to try and fix or mitigate the issue on their end.

Hard to be sympathetic.

This isn't a new issue, nor is it unique to mastodon. Reducing server load for sites like this is a very common exercise for many reasons.

amiga386 · on May 2, 2024

That sounds like victim blaming.

There have been other cases, such as where a mobile app developer has hardcoded an image from someone else's website into their app, then millions of users request it every time they open the app. Or where middlebox manufacturers have hardcoded an IP address to use for NTP.

Sure, having efficient and well-cached infrastructure setup is good, but there's only so much you can do to "reduce server load" where other people in control of widely-deployed software have made choices that causes millions of devices around the world to hammer _you_ specifically.

The people who made those choices don't give a shit, it's not _their_ infrastructure they fucked over. That's why you need to shame them into fixing their botnet and/or block their botnet with extreme prejudice.

Mastodon's link preview service is a botnet, Mastodon knows it, and they refuse to fix it.

Brian_K_White · on May 1, 2024

Ahem, mr skill issue, some information, 15k followers != 15k hits.

amiga386 · on May 1, 2024

There's a straightforward answer. Include all metadata and preview images in the initial post (as data: urls)

Don't make every Mastodon instance have to fetch the linked page and all its assets to generate its own previews

EDIT: as linked in TFA, it has been nearly 7 years and they're still arguing about it:

* https://github.com/mastodon/mastodon/issues/4486

* https://github.com/mastodon/mastodon/issues/23662

Dear nincompoops: if you trust the original poster and original server to send you text and images in a toot, and federated instances to pass that around without modification... then you can trust them equally to send you a URL and an image preview. It's arrogance and idiocy that lead you to believe you can trust their images but can't trust their web preview images and you have to verify that yourself by having the Fediverse DDoS the host. This problem will only get worse as the Fediverse expands. Fix it now, don't ignore it because it makes a problem for someone else

jefftk · on May 1, 2024

> if you trust the original poster and original server to send you text and images in a toot, and federated instances to pass that around without modification... then you can trust them equally to send you a URL and an image preview

I don't think it's that straightforward: normally if I follow @amiga386@mastodon.example you and I both need to trust example.com to accurately report what you say. But if you put a link in to news.example and mastodon.example scrapes and includes a preview I now need to trust mastodon.example to accurately report what news.example is saying. And I might well not!

I got into this more with mockups here: https://www.jefftk.com/p/mastodons-dubious-crawler-exemption

amiga386 · on May 1, 2024

I see what you mean, but I also kinda disagree.

While people have come to rely on centralised services' link preview generators as some kind of trustworthy source, this shouldn't be taken for granted, and Mastodon users definitely should not give that level of trust.

Even on centralised platforms, I've seen endless "screenshots" on Twitter of other web pages and other tweets, aping the form of screenshot-quotations, but actually doctored. The original page or tweet never said what the screenshot claims they did. And I've also seen just the text of tweets claiming that some person said X, when that person did not say X. There can be millions of people affected by this misrepresentation, because they use Twitter-only sources for their information, and don't verify what they see. Then there's the same problem one level up, the screenshot might be a valid screenshot of absolute poppycock published by a partisan news source.

This is why I phrased my original statement the way I did... if you trust the original poster and original server. It's quite possible you shouldn't. Trust should be anchored to the individual toot and its poster, and you should distrust them for misrepresenting link previews in the same way you should distrust them for any other misrepresentations they make.

Ultimately, you should not trust website preview links any more than you trust the person posting them. You should click through (if you even trust opening the link) to see if the preview matches, and you should use existing tools (blocking, defederation) for posters or servers who abuse your trust.

(Edit: and let's also add in the trust that the website represents itself equally to both you and the link preview generator code... a large number of sites become much more responsive and ad-free automatically if you claim to be GoogleBot.. and let's not even mention sites doing A/B testing for virality under the same URL)

Why this is a difficult problem to resolve is that the Mastodon wants one thing - "trustworthy" per-server link previews, meaning tens of thousands of servers come thundering around the same time, and this will be millions of servers in future if they don't fix this - at the expense of others, the link targets. Meanwhile, those affected by the Mastodon community's selfish behaviour want them to clean up their act, at the cost of something the Mastodon community thinks is precious and fears losing (trusting link previews).

I think solutions need to come from behavioural change, which is to say giving link previews no more trust than the other text or images on a toot, and from that direction it would be much more palatable to have the poster supply the preview, because viewers wouldn't be giving it undue trust.

jefftk · on May 1, 2024

> Even on centralised platforms, I've seen endless "screenshots" on Twitter of other web pages and other tweets, aping the form of screenshot-quotations, but actually doctored

That's a different problem: we're talking here about the equivalent of Twitter/FB/etc saying "this is the image and preview text at this link". Which you can trust the traditional social media platforms for.

> Trust should be anchored to the individual toot and its poster, and you should distrust them for misrepresenting link previews in the same way you should distrust them for any other misrepresentations they make.

Note that with Mastodon this could also be caused by their server admins.

amiga386 · on May 1, 2024

> we're talking here about the equivalent of Twitter/FB/etc saying "this is the image and preview text at this link"

Even here, you can't trust this.

You can vaguely trust that the centralised provider won't modify the link preview, they don't appear have been caught doing that despite the fact they totally could... and possibly do, possibly only to specific people, possibly under duress from governments where they operate their servers.

However, it's been shown several times that Facebook won't let you post certain links, including links that are merely uncovering Meta's wrongdoing. So centralised providers still have the power to distort what users see, they just do it more directly and openly than misrepresenting link previews.

What Mastodon users want is to pretend they can have that same level of trust as they could with a centralised provider - which they know they can't, but being honest about that hurts adoption, so they go along with a lie, and instead make their link targets pay all the costs, to make themselves look better.

That's not right, and if their purpose is to make the internet a better place, they should be willing to compromise, be willing to prioritise being a good neighbour to other web services (e.g. add a disclaimer like "poster supplied this link preview image"), over having millions of link preview services DDoS a target so they can say "we have per-server trust in link previews".

kibwen · on May 1, 2024

While I agree with other commenters that a server should be able to gracefully handle the relatively small amount of traffic being generated here, I'm sympathetic to the notion that in this specific instance the problem is exacerbated by all the servers making their requests at almost exactly the same instant, i.e. the Thundering Herd Problem https://en.wikipedia.org/wiki/Thundering_herd_problem . If Mastodon wanted to address this, servers could add a small amount of random delay before fetching a link preview.

Benedicht · on May 1, 2024

Or even share the link preview?

everforward · on May 1, 2024

How would that work without allowing nefarious implementations to federate incorrect previews? Is that even really an issue? It doesn’t seem to be one technically, but I’m not well-versed on the social engineering implications.

Benedicht · on May 1, 2024

My guess is that nefarious implementations already could do a lot of questionable or outright illegal things.

everforward · on May 1, 2024

Yeah, but I think it's worthwhile to try to stop it where possible. URL previews are insignificant enough that it makes more sense to me to just not render them than to render something potentially incorrect. I'd rather not get a URL preview than get one with no confidence that it's correct.

esperent · on May 1, 2024

Maybe by comparing a filehash?

everforward · on May 1, 2024

I'm not sure where it gets the known good hash from without making an HTTP call to the originating URL, though. The trustworthiness of the file hash is the same as the file itself if they have the same origin.

I would also consider to what degree such a system ends up looking like a half-baked, distributed Cloudflare anyways. Like yes, I'm sure we could build some kind of incredibly complicated, reputation-based, distributed link preview caching system. Or the host could just fix their Cloudflare (or accept dying under traffic load).

My generalized experience has been that untrusted, secure, distributed systems are incredibly difficult to build, and that it's probably not worth doing for something as trivial as URL previews. Just let the request die and swap the preview for a message like "This site was down as of $lastTimeItCheckedForAPreview". Maybe change the message so it shows the URL but doesn't make it an <a> element so people can't just click on it to discourage sending them further traffic.

Or worst case, they could try a fallback to reliable, centralized sources for that info. See if Google Search has a cached copy, or archive.org, or whatever else. It's not decentralized, but I also think it should be fine to use non-decentralized features as a fallback for optional features. I've got more confidence that archive.org hasn't tinkered with their version than some random Mastodon instance.

cyanydeez · on May 2, 2024

The client posting should propagate the hash.

But that still wont get you the independent ping, so then the clients immediate server should confirm.

As long as client and server confirm, that should be enough.

Perhaps throw in one more confirmation gateway and, thats gotta be enough trust before you just paranoid

esperent · on May 2, 2024

> I'm not sure where it gets the known good hash from without making an HTTP call to the originating URL, though

Hmm, yeah. I was thinking about reducing the amount of data transferred but it's the actual number of requests that's the problem then it won't help.

amarant · on May 1, 2024

Seems like what they actually want to say is: please help us, we have no idea how to host a moderately popular website.

They must've spelled it wrong...

muglug · on May 1, 2024

I noticed this last year when an article I wrote was reposted by the creator of Mastodon. Visits from individual instance crawlers vastly outnumbered visits from people reading the content.

I had prepared the content for potential virality (hand-written HTML & well-optimised images) but it was still an unwelcome surprise when I checked the server logs and saw all that noise.

h2odragon · on May 1, 2024

Mastodon can stomp their webstite through cloudflare? Is this not a massive failure on clouddflare's part?

I have a low traffic news site; I wonder If I should share this link or would they prefer not to be troubled by the traffic.

bastawhiz · on May 1, 2024

If you tell cloudflare not to cache anything you can hardly be surprised that cloudflare doesn't save you any traffic

iainmerrick · on May 1, 2024

Cloudflare gives you the tools but you have to use them properly.

If you don't allow caching, a CDN can't help you.

paxys · on May 1, 2024

The funny thing is that Cloudflare works perfectly well with default out-of-the-box settings. You have to go in and actively sabotage your own site to make it as broken as this one is right now.

fanf2 · on May 1, 2024

Cloudflare does not cache HTML by default https://developers.cloudflare.com/cache/concepts/default-cac...

kevincox · on May 1, 2024

It really doesn't work well by default. It won't cache HTML and it will serve up CAPTCHAs to subresources like JS and IMG that can't be completed by users as well as RSS feeds so that people can't subscribe.

The defaults are really quite broken.

1231232131231 · on May 1, 2024

They can enable a setting that forces users to complete a captcha before accessing the website. That should prevent all the mastodon requests from getting through.

chrisandchris · on May 1, 2024

Oh, yes, please! Make the web even shittier than it already is!

skilled · on May 1, 2024

Lemmy instances do the same. I believe it is over 1,000 separate GET requests in ~2 seconds once you hit publish.

2024throwaway · on May 1, 2024

The horror.

martinclayton · on May 1, 2024

Previews only use the HMTL head - to get OG metadata - and usually an image.

Never thought about this before. One of my sites is (a single-page application) 45k HTML and 40k image. Hence about 50% of what is served for previews is wasted.

It would be nice if there was a way to recognise an "only html head needed" type of request. Don't think there is?

jakebasile · on May 1, 2024

That'd be the HTTP HEAD verb.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/HE...

abeyer · on May 2, 2024

Nope, unless things have changed since I last dealt with it, Open Graph metadata is not considered "http-equiv" so does _not_ get served as an http header, only as a meta tag in the html header... so the HEAD verb won't get them for you.

jakebasile · on May 2, 2024

Ah yeah, that's just the HTTP headers, not the HTML head. Whoops!

ethanholt1 · on May 1, 2024

That seems like a problem of It’s Foss’ website, not Mastodon. If you’re hosting a presumably static website, a news website no less, it should be able to handle a spike of viewer influx if you make a viral article. Seems like they’re pinning the blame on Mastodon rather than fixing their site.

RobotToaster · on May 1, 2024

It sounds like the author's website/cache/whatever isn't configured properly, but it does sound like a terrible design for the preview data to not be included in the federated data.

soneca · on May 1, 2024

I couldn’t understand the scale of this “fediverse effect”.

There is this quote:

> ” making for a traffic amplification of 36704:1”

Does it mean that posting a link on Mastodon generates 36704 requests to that URL?

skilled · on May 1, 2024

Several thousand, yea. That many, unlikely in a single batch. Favouriting and boosting are actions that amplify how far a Mastodon post is going to spread on a “connected instances” spectrum.

But for popular accounts (15k followers as in this case) will definitely spread the link to thousands of instances instantly and cause thousands of individual GET requests.

It’s fun to tail access.log and see it happen in real time.

SideburnsOfDoom · on May 1, 2024

> Does it mean that posting a link on Mastodon generates 36704 requests to that URL?

No, it's a ratio of bytes for traffic . They're counting size of 1 request to post to Mastodon "a single roughly ~3KB POST", vs the total size of content served from GETing the url in that post.

IDK how valid this metric is, but that's what they're saying.

kevincox · on May 1, 2024

I don't see how the request size to mastodon is at all relevant to the resulting traffic.

SideburnsOfDoom · on May 1, 2024

It's the language of "traffic amplification" as a network attack (1) as used in a DDOS. it's perhaps more relevant when comparing the total bytes in a number of IP Packets of various sizes.

I'm not saying that this is the optimal framing, but that's what they were going for: talking about this, correctly or not, as a de facto DDOS.

1) https://www.microsoft.com/en-us/security/blog/2022/05/23/ana...

lenerdenator · on May 1, 2024

So, you're telling me that I shouldn't go to the main Mastodon instance, copy that link into a toot, and @ the main developer account?

kemotep · on May 1, 2024

So how does being linked to on Twitter, Reddit, and Facebook not do the same? Those sites combined clearly would generate an order of magnitude more traffic. Maybe several orders of magnitude more. There has to be more than 15k itsfoss readers between these sites.

A link preview with an image is over 100 MB? That sounds insane. And if they mean the total traffic in 5 minutes was 100MB that cannot possibly be bringing Cloudflare to its knees. That is an indictment of Cloudflare’s CDN then!

jwalton · on May 1, 2024

> A link preview with an image is over 100 MB?

I think what’s being implied here is that, when you share a link to Facebook, Facebook will access the page to generate a link preview, so will download a tiny bit of HTML and an image. But when you share a link on mastodon, that link immediately gets propagated to many other mastodon servers, which then propagate it to others, so suddenly many thousands of mastodon instances are simultaneously downloading a little bit of HTML and an image, and the cumulative effect of that in this instance was 100MB over a minute or two.

It does seem like a typical static website ought to not have a problem serving that, especially if it’s behind Cloudflare. It seems odd that a single EC2 instance would have a hard time serving that.

But given more than one person is complaining about, it also seems like each mastodon instance could very easily delay propagation of the story by a few minutes to soften the blow here.

iainmerrick · on May 1, 2024

each mastodon instance could very easily delay propagation of the story by a few minutes to soften the blow

I liked that idea at first glance, but thinking about it, CDN performance would actually be better with a single huge burst than if they were smeared out (assuming a very short max-age so the site can be updated rapidly).

kevincox · on May 1, 2024

It's actually more complicated. With HTTP you don't know requests can be coalesced until you receive the response headers with the Cache-Control and Vary. So if your website takes a few seconds to respond most CDNs will send every single request in that period through.

In theory a CDN could optimistically coalesce requests then re-send them when the headers of the first one return. But this is very complex and rarely done in practice.

This can also occur on any time the cache gets stale and needs to be refetched.

btasker · on May 1, 2024

> most CDNs will send every single request in that period through.

I don't think this is true. It certainly isn't for any CDN that I've worked for or on.

Cloudflare don't do this either - they use a cache lock - the first request basically acts as a blocker for all the others, leaving the other requests waiting for the response (if it's cacheable they serve that response, if not then they proceed to origin).

It's normally configurable, but most sane CDNs do have it enabled by default, precisely because big bursts tend to be sharp in nature and a cache miss can be origin breaking at that point.

Just for completeness's sake, Nginx's HTTP proxy module can do it too (the setting's proxy_cache_lock) though it is off by default there.

sumtechguy · on May 1, 2024

It probably does. The issue is proxy. This is the downside to TLS and letsencrypt. Because man in the middle is a thing. Being a fediverse application the proper idea would be for the clients to P2P the data amongst themselves and lighten the load on the originator. However, in this case a link probably should not do that. So therefore it needs to go get the data itself. It is a DDoS were basically one link can magnify to thousands of other clients. Then until those clients are done they will stomp the site. Perhaps there could be a side channel ask from the client 'hey site are you cool with me showing the user a preview of this'? Or 'hey site is it ok if I share a preview of your page'? That would not totally stop the rush of requests. But would at least keep it from a render of the page for a preview. This method is half baked and something I just came up with and seems wrong though. But auto following links is not exactly in spec either.

ehutch79 · on May 1, 2024

None of that makes sense.

What does getting a certificate through let's encrypt have to do with the server getting overwhelemed?

It's a thing that happens once to renew a cert every couple months.

The performance hit of https on a modern server is negligable. The performance hit on a watch is negligable.

sumtechguy · on May 1, 2024

It is not the overhead of the TLS bit but the chain of trust. Putting a caching proxy in the middle of the system is breaking that trust chain. You can do proxy with TLS but it is tricky to do to maintain that chain of trust. A proxy would be a classic way to mitigate (not eliminate) that sort of issue. Either on the serving side or client side. Basically lower the data being returned but not the calls from the clients themselves (this is a DDoS issue). TLS just makes it harder to do proxy.

ehutch79 · on May 1, 2024

I still have no idea what you're talking about.

Cloudflare is either passing traffic through without touching it, or as a proxy is doing tls termination, as they're a trusted CA in most devices/browsers/OSs/etc.

None of this really has anything to do with that's happening with OP.

BlueTemplar · on May 1, 2024

How much MitM is even an issue with link previews, rather than actual links ?

sumtechguy · on May 1, 2024

If I think like an attacker I would say a preview would be an even juicier target. I can basically run something on thousands of peoples computers with zero input from them.

ehutch79 · on May 1, 2024

Their site isn't going down from an attack. It's running out of resources for normal operations. It's not even a large amount of traffic. not really.

BlueTemplar · on May 1, 2024

Why would it be about running something, rather than just a paragraph of text and a picture ?

sumtechguy · on May 1, 2024

The thing is you are allowing your client to do things without you saying to do it. That means you trust whoever is sending that link. To render you have to run a bit of JS and grab some other URLs. 'running' is probably not a great term for it. In this case if I were an attacker I can basically cause your mastodon client render without the users doing anything. I would call it a possible attack vector. Instead of the originating client rendering it and sending a picture along with the URL, it is telling the other side 'hey here is a URL' then the client own its own going to get a new snapshot. Some sort of render needs to happen for that picture to be created. In the second case that means if I were a sneaky sort I could send a link out to a group and I know an exploit in the mastadon render code I could cause interesting things to happen. I could also use this to attack sites. If I get into a large enough group I could basically spam the group with a bunch of URLs and cause a DDoS to any victim site I want.

ehutch79 · on May 1, 2024

Each of those caches the preview/etc. so they hit it once in a while, and that works for all their users. Each mastadon server is independent and can't share that preview cache.

All the issues sound more like a them problem than mastadon or cloudflare.

Something's wrong with their setup. Hard to tell as now HN has brought them to their knees

kemotep · on May 1, 2024

Cloudflare being unable to handle 100MB of traffic over 5 minutes sounds like itsfoss failed to set something up properly. The link preview is a fraction of the total page that would have to be loaded for each individual user no matter where they came from.

ActivityPub itself could more intelligently cache or display the content but something doesn’t add up if Cloudflare can’t handle that kind of traffic let alone Hacker News traffic.

ehutch79 · on May 1, 2024

They've set their headers to prevent caching by cloudflare. max-age=0

The images arn't the problem, it's their dynamicly generated page, even if the content is pretty static.

1231232131231 · on May 1, 2024

I believe cloudflare is just letting their requests go through and their webserver can't handle it?

ehutch79 · on May 1, 2024

vbezhenar: They've misconfigured something and the page itself is marked max-age=0 which disables cloudflare caching. I'm also betting nginx isn't caching either.

vbezhenar · on May 1, 2024

What's the point of using Cloudflare without caching?

dabber · on May 1, 2024

Cloudflare has some other features like a Web Application Firewall (WAF) and bot protection/filtering (which seems like it would solve their problems too?)

From the article:

> Presently, we use Cloudflare as our CDN or WAF, as it is a widely adopted solution.

To me, it sounds like the author isn't really familiar with the difference between a CDN and WAF is; or familiar with Cloudflare beyond it being a popular thing they should probably have for that matter.

OJFord · on May 1, 2024

> Something's wrong with their setup. Hard to tell as now HN has brought them to their knees

Agreed, and hardly surprising is it that if the site can't cope with link preview traffic that non-trivial page views would be trouble too!

OJFord · on May 1, 2024

Well all Twitter links (annoyingly) are made to go through t.co, so either Twitter is generating and serving the link preview directly or it's its shortening service that's being hit, probably for a cached response, not the upstream.

But I agree this does seem strange, like it shouldn't be unmanageable load at all.

thinkingtoilet · on May 1, 2024

Perhaps because they're not federated they can cache it.

pentagrama · on May 1, 2024

Not the best solution for them, but what about remove the metadata from the pages <head> that let Mastodon and other social media platforms request for the preview images? It will hurt discoverability and clicks maybe, but is better than the site having downtimes.

nicolas_17 · on May 2, 2024

How would Mastodon know that the HTML page lacks any metadata in <head> without actually requesting the HTML page?

captn3m0 · on May 1, 2024

Wasn’t there a TLS extension in draft for verifiable archiving (like DKIM for HTTPS responses). Can’t seem to find it now, but that could help with supporting an authenticated link-preview that doesn’t amplify on reach other servers.

rvz · on May 1, 2024

This DDoS'ing problem is due to how the fediverse works. The problem is that federation in general in social networks is extremely inefficient with tons of flaws and this issue one of them.

jononor · on May 1, 2024

Seems like Mastodon is perfect for some basic load testing of a website/blog :)

zagrebian · on May 1, 2024

Sounds like web performance optimization is more important than ever.