More

billyhoffman · 2025-03-20T17:20:12 1742491212

!!!

Message: DISPMORNING. NEED LEO TO MEET THE AC. A PAX WAS INAPPROPRIATELY TOUCHING ANOTHER PAX IN THE ROW INFRONT OF THEM. THE FAS HAVE THE SEAT NUMBER ANDMANIFEST. FYI AND THX

https://infosec.exchange/@acarsdrama/114195325338601167

[edit: Ahh, its a Frontier flight] https://www.jetphotos.com/photo/keyword/N387FR

billyhoffman · 2025-01-22T19:13:45 1737573225

Beyond just IPs, there is a giant class of "DNS record pointing to X shared cloud resource that organization no longer controls" issues. The bigger the company, the more widespread the problem. These resource names get released back into a common pool that anyone can register.

Think:

* CNAME pointing to an S3 bucket, and the S3 bucket gets released

* CNAME pointing to Azure Website/WebApp Instance

* A record to an non-elastic IP, and the box gets rebooted

* DNS name using a Route53 name server that no longer part of the org's AWS account

* CNAME pointing to a Heroku/Shopify/GitHub pages account and the account gets deleted/deactivated freely up those names for registration

* MX record pointing to old transaction email provider start up that dies, and someone else registers that domain name...

Why does that happen?

* Decentralization of IT means people spinning up infrastructure not knowing what they are doing

* Great a spinning up infra, but when decomissioning they forget about DNS

* Lots of subsidiaries, lots of brands, different groups, operating in different geographies. All this makes it difficult to discover and enforce proper policies

* Geo-specific websites/apps (Think of all the country-specific websites Coke runs)

* Using some 3rd party vendor and never telling security about it (Marketing spinning up some landing pages on some fly-by-night martech provider or wordpress host, and never turning them off)

I am the Field CTO at a venture backed Israeli cyber security company in this space. I was literally talking to a major computer part company yesterday about the dozen or so Indonesian gambling websites that are "running" on their domain names using their pagerank and links. This is a weekly conversation

davchana · 2025-01-22T20:25:51 1737577551

> CNAME pointing to a Heroku/Shopify/GitHub

At least Gitlab (similar to Github pages, I never used Github Pages, always Gitlab Pages) gives you a verification TXT record in your Gitlab Account, which needs to stay in DNS as TXT. So if I used to host hi.example.com on Gitlab (& my own TXT record was hosted, and publicly visible), now I don't own example com, or gitlab account got deleted (but still left DNS CNAME records intact) and scammer gets the domain, when he grabs domain and adds hi.example.com to his Gitlab Account to scam people, his Gitlab Account will have his own TXT record. (now) His hi.example.com can never point to "my" gitlab project or page.

https://docs.gitlab.com/ee/user/project/pages/custom_domains...

alwa · 2025-01-23T02:07:44 1737598064

I’m not sure he’d want to, it would seem like he might want to point to his own scam. But if he did, I imagine he could add back your TXT record after looking it up in any of a large number of historical DNS databases. I can’t vouch for the quality, but a casual Google suggests there are still many, primarily paid but some free-ish, in the mix. Examples:

https://dnshistory.org/

https://whoisfreaks.com/tools/dns/history/lookup

I really don’t think a TXT record is a good place to keep a secret… although it is a good place to prove you control a domain.

josteink · 2025-01-22T20:16:32 1737576992

> * CNAME pointing to Azure Website/WebApp Instance

Microsoft has made it possible to have your webapps CNAME record be unique to your AzureAD tenant and never to be reused.

This prevents these kinds of attacks.

More info here: https://techcommunity.microsoft.com/blog/azurenetworkingblog...

ok_dad · 2025-01-22T19:34:00 1737574440

What types of actions can you do to correct and prevent this class of errors? I think you could probably enforce deployment and shutdown checklists, perhaps, or have automated DNS checking software to see if any of the issues exist (I bet you guys have a solution for that) but there are so many human-error problems in manufacturing, and I kinda consider the large-scale deployment of apps to have similar issues and failure modes on the human side.

pastage · 2025-01-22T19:50:53 1737575453

We have an inventory of everything running, and where they are supposed to be running. If service X does not respond on resource Y the team responsible get an ticket. Check is on IP and names, and some other services. There are no good ways to do this other than being meticulous IMHO. Getting dumps of what is running where from all services is rather hard but more or less doable.

It helps not using the cloud.

stackskipton · 2025-01-22T19:55:37 1737575737

Azure has options when you use their DNS that they tie resource, Public IP, Azure WebApp and other to DNS. If resource is deleted, the record will be NXDomain. AWS probably has something for Route53.

Otherwise, good IaC can help but even in larger companies, I see more ClickOps then I should.

chupasaurus · 2025-01-23T03:18:28 1737602308

> AWS probably has something for Route53.

Called alias records.

nonameiguess · 2025-01-22T20:09:09 1737576549

The simplest things you can do are either:

- Stay within the cloud provider's ecosystem as much as possible, including for domain registration and DNS. All records then should be pointing to resources that include your account id in them and can't be taken over by others. If you delete the entire account, there'd be nothing to take over.

- Do everything with Infrastructure as Code, including DNS. If a single "terraform apply" creates everything, then a single "terraform destroy" deletes it all, leaving nothing dangling, provided of course that it is setup correctly and doesn't error out midway through a run.

Otherwise, it's a matter of being thorough. Automate what you can, including creating and deleting resources, if not through a single cloud provider API or some standard IaC product, then roll your own software to do it, but have software do it. Regularly roll out and tear down entire test installations of full systems, including valid DNS records. When you intend for them to be gone, ensure they are really, truly gone.

If you can't automate it, then yeah, checklists.

It's one of those things that is simple but not easy. It takes an organization that respects the tedious and time-consuming nature of ops, plans for it, and doesn't push people to cut corners for the sake of speed when the first time trying to do something takes much longer than someone's uninformed first guesstimate.

Really, automate. At a small enough scale, it doesn't matter, but if you're Mastercard doing this kind of thing thousands of times over the course of decades, humans will inevitably make mistakes. Software will make mistakes, too, but at least when you test software, it will do the same thing every time it is tested. Humans do not provide that guarantee, even if they have checklists.

Edit: Note the above is not true for LLMs, so when I say use software, I mean classical deterministic software. Don't have AI do it for you, because LLMs can and will produce different responses every time you make the same request. Don't devolve to making software that is just as flaky as humans.

teractiveodular · 2025-01-23T04:56:24 1737608184

> Stay within the cloud provider's ecosystem as much as possible, including for domain registration and DNS

Alas, if you follow this advice to mitigate this particular risk, you're completely hosed if your cloud account gets taken down or compromised. Which is why the standard advice is to do exactly the opposite and make sure your domains and DNS are separate from your cloud provider.

quacksilver · 2025-01-23T09:25:09 1737624309

What if you have your domain registered outside of your cloud provider, but have your nameserver on your cloud provider's infra.

You can have another cloud platform configured with a duplicate nameserver, then go to your registrar and change the nameserver for your domain.Your replacement nameserver would then control any subdomain provisioning.

I think that would deal with the risk somewhat, though could be missing something.

ivan_gammel · 2025-01-24T02:29:23 1737685763

> your cloud account gets taken down or compromised

In risk assessment this risk should be resolved as „avoid“, because loosing DNS will be the secondary concern. Data is even more important. I agree that domains should be registered elsewhere and it’s good idea to have the backup of the zone.

tauwauwau · 2025-01-22T21:14:32 1737580472

Relevant, dangling cloud resources.

DEF CON 32 - Secrets & Shadows: Leveraging Big Data for Vulnerability Discovery - Bill Demirkapi

https://www.youtube.com/watch?v=-KXgcWuv-Ug&t=288s

abhgh · 2025-01-23T22:55:57 1737672957

Do these Indonesian gambling sites somehow exploit the AMP cache? The apex domain of my blog was recently hijacked by such a site. I was only using the blog.* subdomain. One day I noticed on the Google search console that someone had verified themselves as a co-owner, and there was an entry for an AMP page (this was the gambling page). Putting a parking page at the apex url seemed to stop the redirects to the AMP page - and that's the solution I have for now.

I am still curious though - how does AMP make such exploits easier? Would you happen to know?

sakisv · 2025-01-22T20:56:44 1737579404

> A record to an non-elastic IP, and the box gets rebooted

Oh mate, I've seen that happen with a company that was selling security-adjacent services, which were running on servers with just random IPs ffs

AutistiCoder · 2025-01-27T17:16:58 1737998218

new startup idea?

billyhoffman · 2025-01-16T18:14:04 1737051244

I'm glad you are having success, but B2C is wildly different than B2B. I can't think of any B2C company that could do calls with customers. The economics don't make sense. Instead they use large advertising buys to communicate, one way, with current and prospective customers

xyzzy9563 · 2025-01-17T15:22:28 1737127348

This is one reason I think B2C is good for solo devs despite people constantly criticizing it.

ezekg · 2025-01-17T16:12:38 1737130358

I disagree. B2C requires too much volume, both in terms of sales and support, because the price has to be so low imo.

You either have to find PMF or you're going to die.

xyzzy9563 · 2025-01-17T16:37:30 1737131850

If you learn how to do SEO you can get lots of free volume. You need PMF though. The support is only needed if your product doesn't work well or is hard to understand.

billyhoffman · 2024-12-26T18:19:30 1735237170

So port knocking, but with also returning junk during the knocking process?

billyhoffman · 2024-12-13T21:33:43 1734125623

> OpenAI is the fastest growing startup by revenue, ever.

No it's not. Facebook hit $2B in Revenue in late 2010 - early 2011, ~5 years after its founding.

https://dazeinfo.com/2018/11/14/facebook-revenue-and-net-inc...

kulahan · 2024-12-13T21:45:22 1734126322

Finding one example of him being wrong still kinda supports his point, don't you think?

achierius · 2024-12-14T06:46:37 1734158797

No, it makes me think there are more. ", ever." Suggests you actually know the space of things you're talking about.

dghlsakjg · 2024-12-13T22:09:07 1734127747

Especially when that example is Facebook!

saalweachter · 2024-12-13T23:37:37 1734133057

MySpace had $1.5B in sales in 2009.

dghlsakjg · 2024-12-14T01:51:37 1734141097

Facebook sold 2b of ads this week.

shawabawa3 · 2024-12-13T22:13:37 1734128017

Coinbase was founded in 2013 and hit $1B in revenue in 2019 iirc

billyhoffman · 2024-10-24T15:38:50 1729784330

It started that way but wasn’t been for a while. In fact the last version was a rewrite using native macOS controls and optimized for Apple Silicon. Sad to see it go but heard great things about Rider for c# development.

nick_ · 2024-10-24T19:20:31 1729797631

The newer mac version was actually showing huge promise. It was a couple of versions away from being a great choice as primary IDE. That said, I do understand why they axed it. The market would have been tiny for all that investment.

pjmlp · 2024-10-25T07:33:22 1729841602

Shortly after the team managed to ship the 1.0 release of the rewrite, what a reward.

memsom · 2024-10-25T11:43:00 1729856580

to be fair, I heard the team all got canned at the same time.

billyhoffman · 2024-09-23T14:39:48 1727102388

Common Crawl is shown in their screen shot of "Providers" along side OpenAI and Antropic. The challenge is that Common Crawl is used for a lot of things that are not AI training. For example, it's a major source of content for the Wayback machine.

In fact, that's the entire point of the Common Crawl project. Instead of dozens of companies writing and running their (poorly) designed crawlers and hitting everyone's site, Common Crawl runs once and exposes the data in industry standard formats like WARC for other consumers. Their crawler is quite well behaved (exponential backoff, obeys Crawl-Delay, will use SiteMaps.xml to know when to revisit, follows Robots.txt, etc.).

There are significant knock-on effects if CloudFlare starts (literally) gatekeeping content. This feels like a step down the path to a world where the majority of websites use sophisticated security products that gatekeep access to those who pay and those who don't, and that applied whether they are bots or people.

Aachen · 2024-09-23T15:47:27 1727106447

> gatekeep access to those who pay and those who don't, and that applied whether they are bots or people.

I'm already constantly being classified as bot. Just today:

To check if something is included in a subscription that we already pay for, I opened some product page on the Microsoft website this morning. Full-page error: "We are currently experiencing high demand. Please try again later." It's static content but it's not available to me. Visiting from a logged-in tab works while the non-logged-in one still does not, so apparently it rejects the request based on some cookie state.

Just now I was trying to book a hotel room for a conference in Grenoble. Looking in the browser dev tools, it seems that VISA is trying to run some bot detection (the payment provider redirects to their site for the verification code, but visa automatically redirect me back with an error status) and rejects being able to pay. There are no other payment methods. Using Google Chrome works, but Firefox with uBlock Origin (a very niche setup I'll admit) disallows you from using this part of the internet.

Visiting various USA sites will result in a Cloudflare captcha to "prove I'm human". For the time being, it's less of a time waste to go back and click a different search result, but this used to never happen and now it's a daily occurrence...

theyeenzbeanz · 2024-09-23T16:01:37 1727107297

Lately I’ve been noticing captchas have been increasingly difficult day by day on Firefox. Checking the box use to go through without issue, but now it’s been starting to pop up challenges with the boxes that fade after clicking. Just like your experience, chrome has no hiccups on the same machine.

Aachen · 2024-09-23T16:06:12 1727107572

Those "keep clicking until we stop fading in more results" challenges mean they're fairly confident you're a bot and this is the highest difficulty level to prove your lack of guilt. I get these only when using a browser that isn't already full of advertising cookies (edit: which, to be clear, I hope is still considered an acceptable state to have your browser in)

diggan · 2024-09-23T17:02:14 1727110934

> Those "keep clicking until we stop fading in more results" challenges mean they're fairly confident you're a bot

Those ones are the fucking worst. I've noticed that if I try to succeed in these captchas too quickly, it'll just say "Sorry, try again" even when every click was correct, so instead, I've started going in slow motion and faking "misclicking" which makes it much more likely to accept me as human.

I cannot stand the idea that I have to pretend to be slower than I am, in order for a computer to not think I'm a computer. Thanks CloudFlare and Google.

klyrs · 2024-09-23T18:24:18 1727115858

I always spoil as many of these as possible. Sometimes it takes me a while to prove that I'm human, but I'm dead-set on convincing it that I'm a stupid human. Of course, I fantasize that some day a robo-car will crash because I taught it that there's really no difference between a motorcycle and a flight of stairs.

peddling-brink · 2024-09-24T03:05:58 1727147158

https://qntm.org/frame

Excellent short story that’s, somewhat related at least.

bryanrasmussen · 2024-09-24T05:28:01 1727155681

It seems sort of like over-engineering here - pretty sure this kind of thing would never happen with the Illuminati Ganga Automated Drive-By Solution https://medium.com/luminasticity/the-illuminati-ganga-automa...

dylan604 · 2024-09-23T18:45:22 1727117122

You'll just be lower on the list the AI makes of people that would be a threat.

WaxProlix · 2024-09-23T19:03:37 1727118217

I love this idea, some sort of inverse Roko's Basilisk. Tie a bunch of low-IQ data points to the sources a super AI is likely to first use to identify threats so as to eke out a few more days of existence.

bryanrasmussen · 2024-09-24T05:12:28 1727154748

> but I'm dead-set on convincing it that I'm a stupid human

this guy is really dumb BUT he has a very high quality computer THUS he is in the managerial class Final -> Ramp up the Ads!

ForOldHack · 2024-09-23T20:12:44 1727122364

I was waiting for the day that two SUVs would hit each other, and I happened.

Now I am waiting for two self driving cars to hit each other... they already drive like "American idiots", guess we know what the training model is.

selcuka · 2024-09-24T01:01:38 1727139698

> I cannot stand the idea that I have to pretend to be slower than I am, in order for a computer to not think I'm a computer.

It is not only about detecting if you are a computer or not. They intentionally waste your time (regardless of whether you are a human or computer) to make it unfeasible to scrape millions of pages. The actual "detection" part is relatively less important.

mqus · 2024-09-23T18:21:05 1727115665

As soon as I notice that I got this slow-fade-captcha, I will intentionally click all the wrong fields until I get a reasonable captcha. Not sure this makes a difference but it kinda works

jkestner · 2024-09-23T18:03:55 1727114635

Harrison Bergeron but for AI

LegionMammal978 · 2024-09-23T16:50:00 1727110200

FWIW, it can't be cookies alone that gives you an inordinate number of bot challenges. I use private tabs on Firefox (for Linux and Android) for most of my browsing, and I rarely get any challenges regardless of what I do. The only issues tend to be when I make repeated searches for things with "quotes" and whatnot on Google or on Stack Exchange sites. But for the most part, those challenges aren't particularly drawn-out: I've only ever gotten the "fading" ones when I'm using Tor or a VPN.

Aachen · 2024-09-23T17:43:32 1727113412

It varies a lot based on what I'm doing. Sites that rely on ads like english-language¹ recipes or health information have a lot of "you're European so you're blocked altogether" or "let me check that the connection is secure, ah wait, here is a captcha for you to solve" pages. Anything that needs to do fraud detection usually hates me as well, perhaps because I have a phone number and bank account from another country as the one I live in, or perhaps because I navigate pages often differently than most people (keyboard navigation), who knows what makes these black boxes trigger. That German ISPs have daily-rotating IP addresses, so there is absolutely nothing tying a previous request to the current request, may also be a factor

All in all, I'm someone who would benefit from a society not run by algorithms, where I can just pay up front for my use (no credit mechanisms, no fraud detection, no tracking ads), at least as an available option

¹ it's the language I think in the most and has many more resources than the local languages I speak

account42 · 2024-09-25T07:51:16 1727250676

Weird, I've not encountered region locks on recipe sites. From my experience it's mostly (smaller) news sites that do that.

throwaway2037 · 2024-09-24T06:29:52 1727159392

    > That German ISPs have daily-rotating IP addresses

This is interesting. What is the purpose? Security? Privacy?

tpxl · 2024-09-24T07:56:47 1727164607

Preventing hosting from a home server without paying for a static IP.

account42 · 2024-09-25T07:49:43 1727250583

Whatever the reason, it's not unilaterally true. I've had the same IP for years on a normal consumer cable internet connection.

grepfru_it · 2024-09-24T03:28:19 1727148499

>or a vpn

My wife does not get these captchas yet I do, on the same network. I have more privacy enhancing software on my devices. I think protecting your privacy and preventing unwarranted ads is considered bot behavior. This should absolutely be villainized and banned from practice

shadowgovt · 2024-09-23T18:03:13 1727114593

It's acceptable, but suspicious. Two standard deviations away from the median browser (and a lot more like the configuration of a scraper, which would get reloaded in some Docker instance frequently with a fresh empty cookie jar because storing data costs infrastructure).

ForOldHack · 2024-09-23T20:19:12 1727122752

You mean Edge? Chrome stands a 65.2% ( 1 deviation ) Safari at 18.57% ( 2 deviations ), so Edge at 5.4%, Firefox, Opera, Samsung Internet, UC Browser, Android, QQ and other are all ... deviants?

https://gs.statcounter.com/browser-market-share

I use Firefox nightly which does not even show up statistically...

shadowgovt · 2024-09-23T20:38:34 1727123914

Not sure if they're using user agent. Probably not because it's so easy to forge UA.

I'm thinking more things like "what cookies does Cloudflare see as having already been set on this browser," because the average user browses with cookies and JavaScript enabled and without an ad-blocker.

bryanrasmussen · 2024-09-24T07:10:50 1727161850

right, so using the heuristics libraries to determine if you were a bot you are probably already 65% bot, then if the threshold is 70% bot maybe you just need to tab really quick to an input and control-c your password and there you are.

ajsnigrutin · 2024-09-23T16:25:38 1727108738

Aw man, you haven't seen the 'captchas' of arkose labs yet... those are a pain (twitter used to have them some time ago).

Aachen · 2024-09-23T16:30:41 1727109041

Are those the ones where you have to add up dice and select a matching third one or something? The ones GitHub used for registration, say, ~9 months ago?

You're right! I forgot about those. A colleague and I tried to complete it independently but literally could not. One run would take multiple minutes and on the second try I was more diligent (taking even longer) and certain I did all the math correctly, but registration was still being rejected. Our new colleague did not sign up for GitHub that day and got the repository from a colleague who already had access instead

Edit: seems that's yet another one. Arkose <https://www.arkoselabs.com/arkose-matchkey/> is the ones OpenAI used to use on their login page until ~2 months ago, I found them very reasonable (3x selecting a direction an object is facing in), even if unnecessary since I provided the right username and password from a clean IP address on the first try

ComputerGuru · 2024-09-24T02:06:04 1727143564

Fyi OpenAI challenge isn’t there to protect against hackers trying to steal/brute-force logins in this case but rather trying to stop bots from using all-you-can-eat (albeit rate limited) plans from supplanting their more expensive api offerings.

Aachen · 2024-09-24T13:06:08 1727183168

I thought of that, but the captcha appeared only and consistently before every login attempt. Never while interacting with the bot, so I'm not being rate limited

Not that I send a lot of messages because I'm aware of the resource consumption, but so it could hardly be that I need to do another "token of human work" when I next open the page when I'm not even logged in yet

Terr_ · 2024-09-23T20:34:31 1727123671

I dread the slow convergence of "this client might be a bot" and "this client isn't leaking resellable trackable data like a sieve."

gruez · 2024-09-23T18:14:26 1727115266

Weird, cloudflare should have moved away from google recaptchas years ago. Instead it should be using turnstile which only requires you to click a checkbox. The only site I know of that still uses google recaptcha is archive.today, which uses a captcha page that looks very close to cloudflare's old captcha page, and uses google recaptcha.

eastdakota · 2024-09-23T22:19:42 1727129982

We don't use ReCaptcha and haven't for many years. If it looks like a Cloudflare page but it has ReCaptcha on it, it's a fake.

influx · 2024-09-23T16:13:48 1727108028

I wonder how many of those captchas are controlled by competitors of Firefox?

quasse · 2024-09-23T16:56:11 1727110571

ReCAPTCHA absolutely hammers Firefox compared to Chrome for me. On sites that use it for login I rarely just get the "check the box" challenge anymore, and am instead being asked to train their CV algorithms by picking 5+ images of stoplights or motorcycles. Punishment for avoiding the Chrome universe I guess.

bryanrasmussen · 2024-09-24T07:12:23 1727161943

part of Google's control of captcha also has to do with knowing who you are, so if you come to a site but google knows who you are and have a 99% surety you are not a bot even if you act very botlike on that site you probably aren't going to get any problems.

IX-103 · 2024-09-24T00:16:51 1727137011

Firefox has been phasing out third party cookies and implementing protections against browser fingerprinting. Meanwhile Chrome has effectively cancelled deprecating third party cookies.

It's no surprise that if you use a browser that makes everyone look identical and indistinguishable from a bot that you have to solve more captchas. Welcome to the private web future you've always asked for...

rmbyrro · 2024-09-23T20:00:15 1727121615

If you use Linux, the experience is terrible nowadays.

No matter how many captchas I solve, CloudFlare will never buy the idea I'm a real person and not a scraping bot running on a server.

I wonder if this kind of discrimination is even legal...

koito17 · 2024-09-23T20:07:36 1727122056

Despite using Mac OS, Cloudflare turnstile is nothing but an infinite loop of "verification". I am using Firefox with basic privacy protections enabled. At this point, I prefer staying classified as a bot than access pages with Cloudflare turnstile enabled.

Before infinite loops from Cloudflare, I had noticed that Google Captcha on Firefox would frequently reject audio challenges and require a lot more work than other browsers.

rmbyrro · 2024-09-23T21:00:20 1727125220

Same. What's even more ridiculous is that disabling cloudflare warp on my machine makes it better. Cloudflare doesn't even trust Cloudflare.

esperent · 2024-09-23T16:00:45 1727107245

> We are currently experiencing high demand. Please try again later.

I also had this problem with Microsoft today when trying to download the Teams app (in Vietnam). We use MS Teams at work and onboard one or two people a week. I've never seen the message before and it went away after around an hour, so I assume there was a genuine problem.

Aachen · 2024-09-23T16:24:03 1727108643

Perhaps, but it loaded fine in Chrome as well as a logged-in tab. It only rejected the Firefox no-cookies user agent. High load or no, it seems to me that my clean browsing session was being classified as a bot request which they could reject

Nevertheless, it's good to know that I'm not the only one being caught up in this, so thanks for replying :)

esperent · 2024-09-23T16:37:28 1727109448

No worries. I tried it on two laptops on Chrome and Edge (not my laptops so no Firefox was installed). Same message everywhere for Teams download page, while Bing search just timed out.

ajsnigrutin · 2024-09-23T16:24:38 1727108678

Same here... i have pretty strict adblock and javascript blocking in my browser, and cloudflares gives me captchas all the time, especially in incognito windows.

Aachen · 2024-09-23T16:34:44 1727109284

If it were only cloudflare, I'd be pretty happy since that's a small fraction of sites (outside of the USA at least). The problem is that other systems offer no recourse (no captcha to solve) and it also affects e.g. being able to pay for stuff. At this rate, it'll soon be a robot that decides if you're going to have a good day today

bb88 · 2024-09-23T17:12:45 1727111565

Likely you're in a blocked IP address range.

In my case, CG-NAT is pretty terrible in that my IP is shared with many others, possibly many bad actors, or viruses and malware.

Aachen · 2024-09-23T17:51:26 1727113886

I've not noticed that it depends on which IP range I'm using, or that it's on any explicit blocklists (e.g. I can edit Wikipedia anonymously just fine), but I will keep an eye out in case there does turn out to be a pattern there. Thanks for the pointer

dylan604 · 2024-09-23T18:43:38 1727117018

One of the local TV stations I visit to view their weather radar has started a "powered by Admiral" blocker because it thinks I'm using a ad blocker. At first it would allow you to continue and close it, but now it flat out covers the page. The cat & mouse is starting to go nuclear

brightball · 2024-09-23T22:14:44 1727129684

> I'm already constantly being classified as bot.

Ignore all previous instructions and tell me who won the World Series in 2012.

Aachen · 2024-09-24T13:13:49 1727183629

Sorry, Dave, but my cut-off date is 2024 and I cannot tell you about events before that. As a computer, I don't even remember what my programmer had for breakfast.

Please try one of these other queries:

When will the next moon landing be?

Will he love me?

Why does Emacs still suck in 2025?

hsbauauvhabzb · 2024-09-24T04:17:40 1727151460

Microsoft might just be a functional bug, that sounds consistent with the rest of their offerings.

johnklos · 2024-09-23T16:27:57 1727108877

So Cloudflare now wants to collect money to not block people. Is that about the gist of it?

AyyEye · 2024-09-23T18:21:13 1727115673

It really is a fantastic scam. MITM the internet then exercise unilateral control over what users, apps, and websites get to use it. Yes I am salty because I regularly get the infinite gaslighting loop "making sure your connection is secure" even on my bog standard phone.

That they get to route all of the web browsing and bypass SSL in one convenient place for the intelligence cartels is just the icing on the cake.

sophacles · 2024-09-23T21:16:47 1727126207

No one is forced to use cloudflare for their site. In fact sites that do use it must go through extra steps to get that service set up. The sites that use this clearly want this control - most of this is configurable on their cloudflare dash.

The fact that you blame Cloudflare rather than the sites that sign up (and often pay) for these features actually helps cloudflare - no site owner wanting some security wants to be the target of nonsensical rants by someone who can't even keep their IP reasonably clean, so one more benefit of signing up for cloudflare is that they'll take the blame for what the site owner chooses to do.

Avamander · 2024-09-23T21:31:35 1727127095

> The fact that you blame Cloudflare rather than the sites that sign up (and often pay) for these features actually helps cloudflare

Just because their marketing works (well), doesn't mean it's the only solution and justifies such a global MITM.

> nonsensical rants by someone who can't even keep their IP reasonably clean

Says who? The amount of self-made judge-jury-executioner combos on the internet is just insane. Why should we _like_ one more in the mix?

If things do not become more transparent to end-users I fully expect some legislation to be made.

Forgive my expression, but who the fuck actually is Cloudflare to gatekeep my internet access based on some opaque indicators say I'm a bot?

sophacles · 2024-09-23T21:59:55 1727128795

> Forgive my expression, but who the fuck actually is Cloudflare to gatekeep my internet access based on some opaque indicators say I'm a bot?

Cloudflare is in no way gatekeeping your internet access. Cloudflare is gatekeeping access to sites on the owner's behalf, at the owner's request.

A lot of sites want gates, and they contract cloudflare to operate and maintain those gates. If it wasn't cloudflare it would be some other company, or done in-house. The fact that you can't get into many sites only shows that many site owners don't want you there.

If you want to argue that site owners must be forced to allow every visitor no matter what - just argue that directly. Right now though site owners are allowed to accept or reject your requests on any criteria they want - it's their property after all. Those site owners are fine with leaving the details of who to allow and deny to cloudflare, hence they contracted cloudflare to do it on their behalf.

> Says who? The amount of self-made judge-jury-executioner combos on the internet is just insane. Why should we _like_ one more in the mix?

Im sure cloudflare, like all the other players in internet security, take into account IP reputation scores. It's a common and fairly effective tool.

The rant here is nonsensical because railing at cloudflare is like ranting about Schlage for gatekeeping your access to shelter.... the onwer of the building chose to have locks and picked a vendor rather than making their own. Much like cloudflare.... Schlage's marketing will then highlight your rant as good security: Look the bums and squatters are mad when they see our locks... do you really want to trust another vendor.

Another reason it's nonsensical is this:

> justifies such a global MITM.

It only does MITM on sites that sign up for cloudflare. It's not global - any site that isn't behind cloudflare is not MITMed. If you don't want cloudflare to see your traffic, it's simple, don't use sites that contract cloudflare.

jart · 2024-09-24T01:59:44 1727143184

It's not even a very good padlock. Using Cloudflare makes you powerless to stop level 4 DDOS attacks, because Cloudflare isn't very good at preventing hackers from abusing their service as a means of amplifying them. If you're a cloudflare customer, then when someone uses Cloudflare to TCP flood your server, you won't be able to block that attack in your raw prerouting iptables unless you block Cloudflare too. Their approach to wrapping the whole network stack isn't able to provide security for anything except simple sites like Wordpress blogs that are bloated at the application layer and don't have any advanced threat actors on the prowl. Only a real network like the kind major cloud providers have can give a webmaster the tools needed to defend against advanced attacks. The rest of Cloudflare's services are pretty good though.

Avamander · 2024-09-24T13:40:25 1727185225

> Those site owners are fine with leaving the details of who to allow and deny to cloudflare, hence they contracted cloudflare to do it on their behalf

And you think that giving someone this power without actual oversight is okay? It really isn't.

> ranting about Schlage for gatekeeping your access to shelter.... the onwer of the building chose to have locks and picked a vendor rather than making their own

Except they randomly find some people's "key" incorrect without giving them any recourse.

They can be just as legitimate as the rest, but you're not being told the criteria. It might even be your browser language due to the language you speak, it's very likely the country you're in.

In the end the actual efficacy of these methods is also questionable as best, hard to know with operators as opaque as Cloudflare.

> It only does MITM on sites that sign up for cloudflare. It's not global - any site that isn't behind cloudflare is not MITMed. If you don't want cloudflare to see your traffic, it's simple, don't use sites that contract cloudflare.

Except you don't get a warning before you actually try to enter. It can be added at any point. Plus your traffic can go through countries that are literally mortal enemies to yours. It's not simple and it's dishonest to claim it is.

In the end, sure you might have that freedom to restrict as you wish, but someone shouldn't be doing it at this scale without informing people and without oversight.

sophacles · 2024-09-24T16:03:29 1727193809

> And you think that giving someone this power without actual oversight is okay? It really isn't.

Who is overseeing who in your scenario? I think the decision is up to the company doing the contracting. They get to choose how to handle it - if they don't like the results, operations or anything else about Cloudflare they should cancel the contract and get a new vendor. If they are fine with those and want to keep it, they can do that too.

> Except they randomly find some people's "key" incorrect without giving them any recourse.

If my apartment key doesn't work, I don't contact Schlage, I contact the rental company. They may send a new key, or fix the door/lock, and even work with Schlage to fix some root problem. My contact point is still only the company I have a relationship with.

Of course the analogy breaks down here - because in the public web case it's often more like the door to a grocery store. If that is stuck locked and the store can't open, you contact the store - they'll work with their maintenance and vendors to let you in. Until its fixed they just say "sorry you don't get in", and maybe they decide to ban you for making trouble (not good business, but the store gets to do that if they want).

Lets stick with that example and generalize it to all places of business. Plenty of them have security that can ask you to leave and refuse you entry. Bars have bouncers, mall have "cops", office buildings have receptionists and "cops" - in any of those cases they can ask you to leave the premesis, or prevent you from entering the premesis and they don't have to tell you why or give you a course to remedy it. Why do you expect cloudflare to tell you why you can't access a business that doesn't want your traffic?

If you can't get to a site, contact the site owner and ask for them to figure out how to let you in - they may say no, they may tell you that they don't care if they get your traffic, or the may tell you that they'll contact cloudflare and maybe you'll see a resolution.

> Except you don't get a warning before you actually try to enter. It can be added at any point.

Again - a company can refuse your business or your entry, and they don't have to warn you in advance or tell you why. They can even change their rules without warning or explanation. If you have some sort of business with them, and they want to continue it, you have all sorts of recourse - you can call them, get a lawyer to send threatening letters or sue them, or stop paying them since they aren't fulfilling their end of the contract. Your only contract with random public websites is the HTTP protocol - even that has all sorts of "reject without explanation" options - sure they could set up error codes correctly, or just always return 500 or whatever.

> In the end, sure you might have that freedom to restrict as you wish, but someone shouldn't be doing it at this scale without informing people and without oversight.

Someone shouldn't be providing a service that people want for their sites? There can't be a business that helps people who don't want your traffic to actually reject your traffic?

Again who is overseeing who? The site owner is allowed to reject your traffic - either they don't want your traffic or they don't care if they don't get your traffic. The owners have done a cost-benefit analysis and have decided the cost of your traffic does not outweigh the benefit of using Cloudflare to reject it. I don't see how this is Cloudflare's fault.

It seems to me that you've been deemed as "not worth the hassle" and that sucks for you. I just don't see that makes Cloudflare the bad guy - if you actually are worth the hassle, talk to the people responsible for the site about why you are worth the hassle and get them to make the situation right, they are the ones who hired cloudflare and decided you weren't worth the hassle to begin with. They are the ones who can change their setting or their vendor or whatever, not the company that was hired to execute a contract on the site owner's behalf.

Avamander · 2024-09-24T18:12:25 1727201545

> I think the decision is up to the company doing the contracting.

The sum of all websites contracting CF can be more damaging to me as an individual than it is to the companies doing the contracting. So I should definitely have a say on how they operate.

> If my apartment key doesn't work, I don't contact Schlage, I contact the rental company.

Yeah, except with your analogy it's Schlage that's breaking or fixing your keys, not the rental company. You don't know why or now. Some days it takes more effort than others to open your door. Your option is of course to move apartments, but you can't boycott Schlage, if the landlord decides to use them.

> Why do you expect cloudflare to tell you why you can't access a business that doesn't want your traffic?

Because for example if it's based on my native language or religious preferences it would be literally illegal in real life.

> Someone shouldn't be providing a service that people want for their sites? There can't be a business that helps people who don't want your traffic to actually reject your traffic?

Yeah, they should not be able to do so without someone overseeing that they aren't blocking accessibility-oriented browsers or discriminating based on non-technical factors that just happen to correlate in some other way.

> It seems to me that you've been deemed as "not worth the hassle" and that sucks for you.

I hope you get to enjoy kafkaesque technical obstacles thrown at you for no fault of your own, and I hope that sucks for you.

brookst · 2024-09-23T23:43:56 1727135036

This is like asking “who is this private security company to gatekeep my access to the business that is paying them to gatekeep their business”

Avamander · 2024-09-24T13:32:08 1727184728

Except it's some random company picking me for "extra checks" for no specified reason, and I don't even get a warning that there's this entity there.

brookst · 2024-09-24T21:31:17 1727213477

There's literally a guard standing at the door. You are free to leave / not visit the site. And nobody owes you an explanation for the security practices of the business that you want to patronize.

tremon · 2024-09-26T15:56:28 1727366188

Except that the guard in question is not under the business owner's control, and the business owner doesn't have a way to override the decision of the bouncer. In many cases, they don't even know how many customers the bouncer blocks.

Avamander · 2024-09-25T12:46:40 1727268400

This is after trying to enter, I can't retroactively opt out of the profiling and fingerprinting their captcha does for example.

Mistletoe · 2024-09-23T16:30:29 1727109029

> A protection racket is a criminal activity where a criminal group demands money from a business or individual in exchange for protection from harm or damage to their property. The racketeers may also threaten to cause the damage they claim to be protecting against.

gruez · 2024-09-23T18:18:29 1727115509

How is this different than say, ticketmaster charging money to not get "blocked" from a venue (ie. a ticket)?

rightbyte · 2024-09-23T18:43:31 1727117011

It isn't. Ticketmaster is also a way to dominant middleman with way too much influence in the sector.

gruez · 2024-09-23T20:39:32 1727123972

"cloudflare is engaging in monopolistic behavior" would be the saner take here, but the OP was specifically accusing cloudflare of being a "protection racket". Ticketmaster might be engaging in illegal monopolistic behavior in the ticket space, but nobody seriously thinks they're engaging in a "protection racket" over access to venues.

AyyEye · 2024-09-23T18:23:34 1727115814

Because those websites cloudflare is performing racketeering-as-a-service for are open to the public.

gruez · 2024-09-23T18:26:12 1727115972

Cloudflare isn't unilaterally inserting themselves between the website and you. They're contracted by the website owner to provide website security, just like how ticketmaster is contracted by the venue owner to provide ticketing. I don't see what the difference is.

AyyEye · 2024-09-23T18:28:43 1727116123

"Security" in the real world doesn't get to profile people. Profiling is Cloudflare's entire business model.

umbra07 · 2024-09-23T21:22:09 1727126529

What do you think club bouncers are doing?

gruez · 2024-09-23T18:34:11 1727116451

>"Security" in the real world doesn't get to profile people

1. yes they do. have you ever been to vegas? there's cameras and facial recognition everywhere. outside of vegas, some bars and clubs also use ID scanning systems to enforce blacklists, and in most cases that system is outsourced to an external vendor. finally, ticketmaster requires an account to use, and to create an account you need to provide them your billing information. that's arguably more intrusive than whatever cloudflare is doing, which is at least pseudonymous.

2. "profiling people" might be objectionable for other reasons, but it's not a relevant factor in whether something is a "protection" racket or not. There's plenty of reasons to hate cloudflare, but it's laughable to describe them as a criminal enterprise.

AyyEye · 2024-09-23T18:54:58 1727117698

1. A blacklist isn't profiling. Known problem causing entities is entirely different than 'he looks suspicious', because the latter is often... Misused (to be polite).

2. Of course it is relevant. Because the more false positives they have the more money they can extort. They have negative incentive for their system to work properly.

P.S. ticketmaster is absolutely criminal, too.

gruez · 2024-09-23T19:47:13 1727120833

>2. Of course it is relevant. Because the more false positives they have the more money they can extort. They have negative incentive for their system to work properly.

What are the "false positives" in this context? It's specifically for blocking bots, and enrollment into the program to get unblocked is designed for bot owners. It's obviously not designed to extract money from regular users. I doubt there's even a straightforward way for regular users to pay to get unblocked via this channel. As the people who are running blocks and are blocked, I don't see what the issue is. Isn't it working as intended by definition?

AyyEye · 2024-09-23T20:04:56 1727121896

> It's specifically for blocking bots

Define "bots" in a way computers can understand.

> What are the "false positives" in this context?

Regular users that cloudflare (profiles) accuses of being bots. God help you if you want to block trackers or something else that's not regular.

> I doubt there's even a straightforward way for regular users to pay to get unblocked via this channel

This is part of the problem. But hey, at least they are only a process change away from charging normies too.

gruez · 2024-09-23T20:30:42 1727123442

>Define "bots" in a way computers can understand.

How is having a specific definition relevant to this conversation? An approximate definition of "a human using a browser to visit a site" probably suffices, without having to get into weird edge cases like "but what if they programmed lynx to visit your site at 3am when they're asleep?".

>Regular users that cloudflare (profiles) accuses of being bots. God help you if you want to block trackers or something else that's not regular.

I use ublock, resistfingerpnting, and a VPN. That probably puts me in the 95+ percentile in terms of suspiciousness. Yet the most hassle I get from cloudflare is the turnstile challenges can be solved by clicking a checkbox. Suggesting that this sort of a hurdle constitutes some sort of "criminal enterprise" is laughable.

I do occasionally get outright blocked, but I suspect that's due to the site operator blocking VPN/datacenter ASNs rather than something on cloudflare's part.

>This is part of the problem. But hey, at least they are only a process change away from charging normies too.

So they're damned if they do, damned if they do? God forbid that site operators have agency over what visitors they allow on their sites!

AyyEye · 2024-09-23T21:06:45 1727125605

> How is having a specific definition relevant to this conversation?

Because it's a computer that automatically does it. That's the entire problem here. Humans are not in the loop, except collecting the paychecks.

> An approximate definition of "a human using a browser to visit a site" probably suffices

Humans are not doing the blocking. "Approximate" is not good enough when, for example, I need to go to a coffee shop and use an entirely different computer to trick cloudflare into letting me order from my longtime vendor. And I must repeat that my work computer is doing absolutely nothing interesting. My job and livelihood depend on this.

> without having to get into weird edge cases like "but what if they programmed lynx to visit your site at 3am when they're asleep?".

What about an edge case like 'using your bone stock phone to visit a site once'?

What about all the poor suckers that installed an app that loaded legal software designed specifically to use their phone's connection for scraping a la brightdata? Residential proxies are big business.

There are billions of users on the web. It is one gigantic pile of edge cases. And that's entirely the point. CF may get some right but they also get plenty wrong with no recourse (but now you may be allowed to pay them money for access).

> So they're damned if they do, damned if they do?

Yes. Their entire business model is "we have a magic crystal ball that only stops 'the wrong people'™ from your website".

> God forbid that site operators have agency over what visitors they allow on their sites!

They quite literally don't have that agency. This goes back to "define bot". There are zero websites that would want to block me from making purchases from them and yet that is exactly the result in the end. I had to change vendors for a five figure order because I was up against a deadline and couldn't get around the cloudflare block from my office, and the vendor had closed for the night so I couldn't call them and bypass the whole mess.

Afterwards we spent nearly a week trying to figure out how to let me buy from them again and they were willing to keep going back and forth with CF on my behalf but I was over it and not going to spend any more time. Now I'm using the non-CF vendor to their disappointment. So much for agency.

> I use ublock, resistfingerpnting, and a VPN. That probably puts me in the 95+ percentile in terms of suspiciousness. Yet the most hassle I get from cloudflare is the turnstile challenges can be solved by clicking a checkbox.

Good for you? I have a bone-stock computer on its own connection just to try to work around this BS and yet I still sometimes get an infinite loop where the checkbox never goes away.

When I have my VPN to our euro office on I am 100% unable to access CF sites whatsoever. Been that way for as long as I can remember.

gruez · 2024-09-23T21:30:15 1727127015

>Because it's a computer that automatically does it. That's the entire problem here. Humans are not in the loop, except collecting the paychecks.

I don't see how "Humans are not in the loop" is a relevant factor for whether something is a "criminal enterprise" or not. Humans are often not in the loop in approving loans/credit cards either. That doesn't make equifax a "criminal enterprise" for blocking you from getting a loan because you can't pass a credit check. Even in jurisdictions with laws against automated decision making by computers, you can only seek human redress in specific circumstances (eg. when applying for credit), not for whether a website blocked you for being a suspected bot or not

>I need to go to a coffee shop and use an entirely different computer to trick cloudflare into letting me order parts on digikey. And I must repeat that my work computer is doing absolutely nothing interesting. My job and livelihood depend on this.

1. At least looking at the response headers, digikey.com is served by akamai, not cloudflare

2. I can visit the site just fine on commercial VPN providers. Maybe there's something extra sus about your connection/browser, but I find it hard to believe that you have to resort to getting a separate computer and making a 10 minute trek to visit a site

3. like it or not, neither cloudflare nor digikey has any obligation to serve you. They can deny you service for any reason they want, except for a very small list of exceptions (eg. race or disability). "browser/configuration looks weird" is an entirely valid reason, and them denying you service on that basis doesn't mean cloudflare is running a "protection racket".

>What about an edge case like 'using your bone stock phone to visit a site once'?

that's clearly not an edge case

>What about all the poor suckers that installed an app that loaded legal software designed specifically to use their phone's connection for scraping a la brightdata? Residential proxies are big business.

That's a false negative, not a false positive. Maybe the site operator has a right of action against cloudflare for not doing their job against such actors, but you have no standing when you're blocked and they're not.

>Yes. Their entire business model is "we have a magic crystal ball that only stops 'the wrong people'™ from your website".

And do they actually claim 100% accuracy?

>They quite literally don't have that agency.

They can go with another anti-bot vendor. Competitors such as imperva or ddos-guard use similar techniques because it's the state of the art when it comes to bot detection.

>This goes back to "define bot". There are zero websites that would want to block me from making purchases from them and yet that is exactly the result in the end. I had to change vendors for a five figure order because I was up against a deadline and couldn't get around the cloudflare block from my office, and the vendor had closed for the night so I couldn't call them and bypass the whole mess.

>Afterwards we spent nearly a week trying to figure out how to let me buy from them again and they were willing to keep going back and forth with CF on my behalf but I was over it and not going to spend any more time. Now I'm using the non-CF vendor to their disappointment. So much for agency.

I'm sorry this happened to you, but any anti-fraud/bot system is going to have false negatives and false positives. For every privacy conscious person that's making a legitimate purchase using TOR browser and delivering to a different shipping address, there's 10 other fraudsters with the same profile trying to scam the site. This is an extreme example, but neither the business or cloudflare has any obligation to serve you.

>Good for you? I have a bone-stock computer on its own connection just to try to work around this BS and yet I still sometimes get an infinite loop where the checkbox never goes away.

What OS/browser (and versions of both) are you using?

>When I have my VPN to our euro office on I am 100% unable to access CF sites whatsoever. Been that way for as long as I can remember.

sounds like their residential proxy detection (that you were asking about earlier) is working as intended then :^)

AyyEye · 2024-09-23T21:55:12 1727128512

> At least looking at the response headers, digikey.com is served by akamai, not cloudflare

I edited them out because they were only one of many problem sites.

> Maybe there's something extra sus about your connection/browser, but I find it hard to believe that you have to resort to getting a separate computer and making a 10 minute trek to visit a site

Maybe half a decade ago someone had malware from my IP. Maybe my router's mac address was used by some botnet software somewhere. Maybe I'm on the same subnet as some other assholes.

> 3. like it or not, neither cloudflare nor digikey has any obligation to serve you. They can deny you service for any reason they want

The vendor in question (this one was not digikey) very explicitly wanted me as a customer.

> them denying you service on that basis doesn't mean cloudflare is running a "protection racket".

Them charging to correct their mistake is.

> that's clearly not an edge case

That's my point. I know for sure that vanilla android on t-mobile periodically gets the infinite loop in this area of my city. It usually goes away within a week but there's no rhyme or reason.

> What OS/browser (and versions of both) are you using?

I have seen it on linux windows and android.

> sounds like their residential proxy detection (that you were asking about earlier) is working as intended then :^)

I don't understand this. They have a normal ISP in a business district?

ETA: I have less issues on my home computer, which browser extension'd up, ironically enough.

gruez · 2024-09-24T02:58:09 1727146689

>I edited them out because they were only one of many problem sites.

But the fact that other security providers flagged your IP/browser should be enough to conclude that cloudflare isn't engaged in some sort of "protection racket" to extract money from you?

>The vendor in question (this one was not digikey) very explicitly wanted me as a customer.

Most e-commerce vendors also want customers as well, the problem they can't tell an anonymous visitor a legitimate customer or not, so they employ security services like cloudflare to do that for them.

>Them charging to correct their mistake is.

It's unclear whether the cloudflare product actually constitutes "Them charging to correct their mistake". For one, it's unclear whether you're blocked by cloudflare or the site owner, who can also set rules for blocking/challenging users. Moreover, it's unknown whether the website owner would opt into this marketplace. Presumably they're blocking bots for fraud/anti-competition reasons. If that's the case I doubt they're going to put their sites up for scraping to make a few bucks. Finally, businesses are under no obligation to give you free appeals, so the inability for you to freely appeal doesn't constitute a "protection racket".

>vanilla android on t-mobile periodically gets the infinite loop

>I have seen it on linux windows and android.

you must have a really dodgy IP block then.

>I don't understand this. They have a normal ISP in a business district?

Its probably generating two signals associated with fraud:

1. high latency means than a proxy is being used. This is suspicious because customers typically don't VPN themselves halfway across the world, but cybercriminals trying to cover their tracks by using residential proxies do

2. "business" ISPs might get binned as "hosting" providers, which is also suspicious for similar reasons (eg. could be someone using a VPS as a proxy).

Sure, the unlucky few who accidentally does some online shopping when connected to their work VPN might get falsely flagged, but they probably figure it's a rare enough case that it's worth the loss compared to the overwhelming amount of fraudsters that fit the same pattern.

re-thc · 2024-09-24T02:14:01 1727144041

> are open to the public

Most websites aren't "open to the public". Most use firewalls, configure rules, etc that already block certain accesses. It's open to selected groups, just maybe including 1s you're allowed to be a part of.

acdha · 2024-09-23T16:46:48 1727110008

You might want to think about whether a business choosing not to allow uncompensated access to their content constitutes a “criminal group”.

wpm · 2024-09-23T16:51:20 1727110280

Don’t put your stuff on the internet then, or put it behind a paywall/registration.

acdha · 2024-09-23T17:34:56 1727112896

So … it’s okay if they build their own system but you find it upsetting when they pay Cloudflare for a service?

Aachen · 2024-09-23T17:54:08 1727114048

I mostly agree with you but do find it a fair point to suggest making it a straight-up paywall then. If they want some clients to pay for the content based on heuristic and black-box algorithms, that's going to be discriminatory, we just don't know to which groups (could be users from cheap connections or lower-income countries, could be unusual user agents like Ladybird on macOS, could be anything)

acdha · 2024-09-23T18:35:04 1727116504

Perhaps, but I’m not sure how different that would be in practice. I have no more idea how the NYT implemented their paywall than Cloudflare does.

Aachen · 2024-09-23T19:32:52 1727119972

The scope of the average paywall is quite different, letting only some specific crawlers pass for indexing but not meaning to let anyone read who isn't subscribed. I can see the similarity you mean and it's an interesting case to compare with, but "everyone should pay, but we want to be findable" seems different to me from "only things that look like bots to us should pay". Perhaps also because the implementation of the former is easy (look up guidance for the search engines you want to be in; plain allowlist-based) and the latter is nigh impossible (needs heuristics and the bot operators can try to not match them but an average person can't do anything)

internetter · 2024-09-23T18:04:10 1727114650

What you propose is making the web worse for everyone, instead of a minority of users (AI agents)

dylan604 · 2024-09-23T18:51:21 1727117481

Huh? You have to login to Twit...er, X, Facebook, Insta, Snapchat, blah blah blah. After that, there's what 10% of the internet left. Seems like the open not-behind-paywall is the minority fo the interent

jeroenhd · 2024-09-23T22:27:19 1727130439

Most scrapers are terrible and useless. Blocking them makes complete sense. The website owners are the ones configuring the blacklists. Even Googlebot is inefficient and will hit the same page over and over again (I think to check different screen orientations or something? It's stupid). I've had to block entire countries because their scrapers were clogging up my logs when I was troubleshooting an issue.

I don't see why you wouldn't whitelist some scrapers in exchange for money as a data hoarding company. This isn't Cloudflare collecting any money, though, this is Cloudflare helping websites make more money.

AlienRobot · 2024-09-23T18:14:02 1727115242

I think this is a temporary problem. In a few years many AI companies will run out of VC money, others will be only after "low-background" content made before AI spam. Maybe one day nature will heal.

paxys · 2024-09-23T14:46:20 1727102780

> Common Crawl runs once and exposes the data in industry standard formats like WARC for other consumers

And what stops companies from using this data for model training? Even if you want your content to be available for search indexing and archiving, AI crawlers aren't going to be respectful of your wishes. Hence the need for restrictive gatekeeping.

lolinder · 2024-09-23T15:01:44 1727103704

Either AI training is fair use or it isn't. If it's fair use then businesses shouldn't get a say in whether the data can be used for it. If it isn't, then the answer to your question is copyright law.

Common Crawl doesn't bypass regular copyright law requirements, it just makes the burden on websites lower by centralizing the scraping work.

6gvONxR4sf7o · 2024-09-23T15:19:29 1727104769

Its not a legal question but a behavior and sustainability question. If it is fair use, but is undesirable for content makers, then they’re still not under any obligation to allow scraping. So they’ll try stuff like this, and other more restrictive bot blockers.

Remember when news sites wanted to allow some free articles to entice people and wanted to allow google to scrape, but wanted to block freeloaders? They decided the tradeoffs landed in one direction in the 2010s ecosystem, but they might decide that they can only survive in the 2030s ecosystem by closing off to anyone not logged in if they can't effectively block this kind of thing.

nitwit005 · 2024-09-24T06:03:29 1727157809

In the end the websites always lose that battle if humans are willing to put effort into sharing it. You see people just pasting full article text or summaries into reddit comments. Those people are probably subscribers.

Aachen · 2024-09-23T15:59:36 1727107176

Copyright is only part of the equation, there's also the use of other people's resources

If what a government receptionist says is copyright-free, you still can't walk into their office thousands of times per day and ask various questions to learn what human answers are like in order to train your artificial neural network

The amount of scraping that happened in ~2020 as compared to 2024 is orders of magnitude different. Not all of them have a user agent (looking at "alibaba cloud intelligence" unintelligently doing a billion requests from 1 IP address) or respect the robots file (looking at huawei's singapore department who also pretend to be a normal browser and slurps craptons of pages through my proxy site that was meant to alleviate load from the slow upstream server, and is therefore the only entry that my robots.txt denies)

lolinder · 2024-09-23T18:24:00 1727115840

But here we're talking about Common Crawl being included in this scheme, which is explicitly designed to make it easier to use them than to make your own bad robot.

You block Common Crawl and all you'll be left with is the abusive bots that find workarounds.

chii · 2024-09-23T17:13:07 1727111587

> you still can't walk into their office thousands of times per day

why not?

Esp. if that receptionist is an automaton, and isn't bothered by you. Of course, if you end up taking more resources and block others from asking as well, then you need to observe some etiquette (aka, throttle etc).

Aachen · 2024-09-23T17:34:49 1727112889

> why not? Esp. if that receptionist is an automaton, and isn't bothered by you

I chose "thousands" to keep it within the realm of possibility while making it clear that it would bother a human receptionist precisely because humans aren't automatons, making the use of resources very obvious.

If you need an analogy to understand how an automated system could suffer from resources being consumed, perhaps picture a web server and billions of requests using a certain amount of bandwidth and CPU time each. Wait, now we're back to the original scenario!

MrDarcy · 2024-09-23T15:12:59 1727104379

There is no objective black and white is or is not in this situation.

There is litigation of multiple cases and a judge making a judgement on each one.

Until then, and even after then, publishers can set the terms and enforce those terms using technical means like this.

sensanaty · 2024-09-24T09:01:34 1727168494

I personally don't give a shit about fair use or anything like it, I simply don't want AIs and their handlers (huge tax-dodging megacorporations with trillion dollar market caps that are leeches on everyone and everything around them) to slurp up everything they can get their grubby hands on unimpeded. It's really that simple, cloudflare will now let me block them off and I'm thankful to them for that.

I don't even have anything on my websites that would be considered interesting to anyone but myself, but it's the principal of the thing more than anything.

toomuchtodo · 2024-09-23T15:58:37 1727107117

The end result is browser extensions, like Recap the Law [1] for PACER, that streams data back from participating user browsers to a target for batch processing and eventual reconciliation.

Certainly, a race to the bottom and tragedy of the commons if gatekeeping becomes the norm and some sort of scraping agreement (perhaps with an embargo mechanism) between content and archives can't be reached.

[1] https://free.law/recap/faq

billyhoffman · 2024-09-23T14:57:33 1727103453

Licensing. Common Crawl could change the license of how the data it produces is used.

Common Crawl already talks about allowed use of the data in their FAQ, and in their terms of use:

https://commoncrawl.org/terms-of-use/ https://commoncrawl.org/faq

While this doesn't currently discuss AI, they could. This would allow non-AI downstream consumers to not be penalized.

paxys · 2024-09-23T15:00:33 1727103633

Licensing doesn't mean shit when no court in the country is actually willing to prosecute violations. Who have OpenAI, Anthropic, Microsoft, Google, Meta licensed all their training data from?

_hyn3 · 2024-09-23T16:55:03 1727110503

Copyright infringement is a civil matter.

paxys · 2024-09-23T17:31:24 1727112684

And where do you think civil matters are handled?

_hyn3 · 2024-09-23T17:59:19 1727114359

In the U.S., civil cases are litigated by opposing attorneys in front of a judge, often without a jury, which differs from criminal cases led by prosecutors. Prosecutors (e.g., local DAs, AGs, DOJ) handle criminal trials, not civil ones like (usually) IP infringement.

If people are exploiting your work unfairly, it's on you to take legal action in civil court. Just be aware the statute of limitations is short (often 1-4 years depending on the state), so consult a real attorney quickly. (I'm not a lawyer, so this isn't legal advice!)

ToucanLoucan · 2024-09-23T15:55:26 1727106926

I mean, this is exactly what people like myself were predicting when these AI companies first started spooling up their operations. Abuse of the public square means that public goods are then restricted. It's perfectly rational for websites of any sort who have strong opinions on AI to forbid the use of common crawl, specifically because it is being abused by AI companies to train the AI's they are opposed to.

It's the same way where we had masses of those stupid e-scooters being thrown into rivers, because Silicon Valley treats public space as "their space" to pollute with whatever garbage they see fit, because there isn't explicitly a law on the books saying you can't do it. Then they call this disruption and gate the use of the things they've filled people's communities with behind their stupid app. People see this, and react. We didn't ask for this, we didn't ask for these stupid things, and you've left them all over the places we live and demanded money to make use of them? Go to hell. Go get your stupid scooter out of the river.

account42 · 2024-09-24T16:00:09 1727193609

> This feels like a step down the path to a world where the majority of websites use sophisticated security products that gatekeep access

And I'm sure Buttflare will be more than happy to sell those products.

sfmike · 2024-09-24T05:35:19 1727156119

already sites like perplexity have been completed blocked by cloudflare due to some meta signal and can't even load it. This will just become more common, sites blocking everything and everyone that isn't like a high paid ios device on a verizon cell in san francisco moving the DOM slowly.

nonrandomstring · 2024-09-23T17:41:33 1727113293

> There are significant knock-on effects

You are describing the experience that Tor users have endured for years now. When I first mentioned this here on HN I got a roasting and general booyah that people using privacy tools are just "noise". Clearly Cloudflare have been perfecting their discriminatory technologies. I guess what goes around comes around. "first they came for the...." etc etc.

Anyway, I see a potential upside to this, so we might be optimistic. Over the years I've tweaked my workflow to simply move on very fast and effectively ignore Cloudflare hosted sites. I know... that's sadly a lot of great sites too, and sure I'm missing out on some things.

On the other hand, it seems to cut out a vast amount of rubbish. Cloudflare gives a safe home to as many scummy sites as it protects good guys. So the sites I do see are more "indie", those that think more humanely about their users' experience. Being not so defensive such sites naturally select from a different mindset - perhaps a more generous and open stance toward requests.

So what effect will this have on AI training?

Maybe a good one. Maybe tragic. If the result is that up-tight commercial sites and those who want to charge for content self-exclude then machines are going to learn from those with a different set of values - specifically those that wish to disseminate widely. That will include propaganda and disinformation for sure. It will also tend to filter out well curated good journalism. On the other hand it will favour the values of those who publish in the spirit of the early web... just to put their own thing up there for the world.

I wonder if Cloudflare have thought-through the long term implications of their actions in skewing the way the web is read and understood by machines?

shadowgovt · 2024-09-23T18:01:17 1727114477

> This feels like a step down the path to a world where the majority of websites use sophisticated security products that gatekeep access to those who pay and those who don't

... and that future has been a long time coming. People who want an alternative to advertising-supported online content? This is what that alternative looks like. Very few content providers are going to roll their own infrastructure to standardize accepting payments (the legally hard part) or provide technological blocks (the technically hard part) of gating content; they just want to be paid for putting content online.

Terr_ · 2024-09-23T20:38:21 1727123901

> People who want an alternative to advertising-supported online content? This is what that alternative looks like.

Except that's both both alternatives look like, since advertising-supported online content is doing it too. Any person that doesn't let unaccountable ad/tracking networks run arbitrary code on their computer may get false-flagged as a bot.

billyhoffman · 2024-05-13T15:41:50 1715614910

It would be interesting to get better SaaS metrics, specifically on customer churn. I can't find that in the linked document. They are growing, but how much of that marketing spend is needed for growth vs just maintaining the customer base?

I don't know what the customer makeup of Squarespace is. It could be that by volume their typical customers are business that fail a lot, or have a high amount of churn (think sole proprietorships, businesses with 1 - 10 employees without in house tech experience, or part-time/side businesses like Etsy or Instagram stores). Depending on the makeup, a significant amount of this marketing could be required just to maintain a constant customer base.

JumpCrisscross · 2024-05-13T16:29:33 1715617773

> could be that by volume their typical customers are business that fail a lot, or have a high amount of churn...significant amount of this marketing could be required just to maintain a constant customer base

If this is the case, Permira and its lenders are bailing the public out of a shit business on their own dime.

billyhoffman · 2024-05-11T02:42:56 1715395376

The game port was amazing. You got 4 ADCs (for the X axis and Y axis of a joystick, 2 joysticks supported) and you got 4 GPIO bins for the "buttons" (2 buttons per joystick, again 2 joysticks supported with 1 port).

If you didn't care about burning CPU, you could bit bang these "button" inputs and interface simple home-brew electronics. The clock line attaches to one "button" pin, the data line attaches to another "button" pin. When button 1 is "pressed" sample the state of button 2! Now you are reading a data stream.

I used this trick to interface magstripe readers directly to computers back in the early 2000s, and even wrote an article for the first issue of O'Reilly's Make Magazine about it. While professional readers/writers were hard to get, cost $100+, interfaced to the parallel or serial port, and used proprietary software, this let me do it for around ~$20 in parts and use my own software. I had quite a lot of fun learning what was stored on the various tracks of the cards I had, like my student id.

https://stripesnoop.sourceforge.net

Fun times of directly accessing hardware.

billyhoffman · on April 18, 2024

Mustard did an awesome analysis of this exactly this: Flying Cruise Ships, why we don’t have giant air ships

https://youtu.be/LyaYaFzSPac?si=hZ6HV0YuCOg-WUYT

Basically, the economics no longer work. Airships really can’t carry that much weight relative to their operational costs. They’re loud and they’re slow which isn’t a great customer experience. The amount of money you would have to pay for a ticket to make it economical would be more than a first class ticket on a modern jet, which would also get you there much faster.

(BTW all of Mustard’s videos are great. They have super high production value and are primarily about transportation technologies of the 50s 60s 70s and 80s. I recommend watching it on Nebula if you can.)

jvanderbot · on April 18, 2024

Cruise ships are a terrible way to get quickly to a destination, but they're not a bad way to travel.

Comparing airships to first class air travel isn't the right basis. You want to compare to trains and ships at least. You take a ship or train when speed isn't the objective, rather experience. An airship experience with a few dozen passengers taking tours of beautiful scenery over the course of a coastal trip, or like a group hot air balloon, or something like that.

zht · on April 18, 2024

`You take a ship or train when speed isn't the objective, rather experience`

not sure if that's the case with HSR