There is no need to extend the syntax of robots.txt.
At the time of crawling the robots.txt is parsed anyway.
If it excludes part or the whole site from crawling, it should IMHO be respected, and the crawl of that day should be stopped (and pages NOT even archived) if it doesn't then crawling and archiving them is "fair".
The point here is that by adding a "new" robots.txt the "previously archived" pages (that remain archived) are not anymore displayed by the Wayback Machine.
It is only a political/legal (and unilateral) decision by the good people at the archive.org, it could be changed any time, at their discretion, without the need of any "new" syntax for robots.txt.
I think that enabling the user to selectively suppress parts of the site for certain archived time spans is a better solution. Sometimes, a page might have been in temporary violation of a law or contract and that version needs to be suppressed. But that particular does not mean that any other version needs to be hidden as well.
>I think that enabling the user to selectively suppress parts of the site for certain archived time spans is a better solution.
But that is easily achieved by politely asking the good people at archive.org, they won't normally decline a "reasonable" request to suppress this or that page access.
As a side note, there is something that (when it comes to the internet) really escapes me, in the "real" world, before everything was digital, you had lawful means to get a retraction in case - say - of libel but you weren't allowed to retroactively change the history, destroying all written memories and attempts like burning books on public squares weren't much appreciated, I don't really see how going digital should be so much different.
I guess that the meaning of "publish" in the sense of "making public by printing it" has been altered by the common presence of the "undo" button.
Another sign I am getting old (and grumpy), I know.
Well, you are missing the part where producing and selling new copies of e.g. libelous works can be forbidden in the real world. So the old copies will still be around, but they have to be passed on privately. Effectively, this takes affected works out of circulation.
No, actually it was exactly the example I made, in the case of libel someone with a recognized authority (a Court) can seize/impound the libelous material and prohibit further publication (and of course not destroy each and every copy in the wild), but the procedure is very different from someone (remember not necessarily the actual Author, actually only the owner of the domain/site at a given moment) being able to prevent access to archived material published in the past (material that does not represent a libel and is not violating any Law) only because he/she can.
At the time of crawling the robots.txt is parsed anyway.
If it excludes part or the whole site from crawling, it should IMHO be respected, and the crawl of that day should be stopped (and pages NOT even archived) if it doesn't then crawling and archiving them is "fair".
The point here is that by adding a "new" robots.txt the "previously archived" pages (that remain archived) are not anymore displayed by the Wayback Machine.
It is only a political/legal (and unilateral) decision by the good people at the archive.org, it could be changed any time, at their discretion, without the need of any "new" syntax for robots.txt.