Back when search engines caused this, the industry made an agreement and designed the robots.txt spec in order to avoid legal frameworks being made to stop them. Because of that, legal frameworks weren't being made.
Now there's a new generation of hungry hungry hippo indexers that didn't agree to that and who feel intense pressure from competition to scoop up as much data as they can, who just ignore it.
Legislation should have been made anyway, and those that ignore robots.txt blocked / fined / throttled / etc.
Unethical behaviour always has a huge advantage over ethical behaviour, that's nothing new and pretty much by definition. The only way to prevent a race to the bottom is to make the unethical behaviour illegal or unprofitable.
I don't know if "robots.txt" appearing in congressional record really counts. Do any of the decision makers appear to have a command of what the file does? Or do they typically relegate to industry professionals, as they often do?
How would legislation in the US or EU stop traffic from China or Thailand or Russia? At best you'd be fragmenting the internet, which isn't really a "best", that's a terrible idea.
This is the key point, but if US laws are being violated and AI is considered part of national security, that could be used by the US government in international negotiations, and for justification for sanctions, etc. It would be a good deterrent.
Now there's a new generation of hungry hungry hippo indexers that didn't agree to that and who feel intense pressure from competition to scoop up as much data as they can, who just ignore it.
Legislation should have been made anyway, and those that ignore robots.txt blocked / fined / throttled / etc.