Open file on disk. See that it's 404. Delete file. Re-run crawler. You'd turn th...

pyre · on Jan 19, 2015

> grep -R 404

This isn't 1995 anymore. When you hit a 404 error, you no longer get Apache's default 404 page. You really can't count on there being any consistency between 404 pages on different sites.

If wget somehow stored the header response info to disk (e.g. "FILENAME.header-info") you could whip something up to do what you are suggesting though.

sillysaurus3 · on Jan 19, 2015

Yeah, wget stores response info to disk. Besides, even if it didn't, you could still visit a 404 page of the website and figure out a unique string of text to search for.