Open file on disk. See that it's 404. Delete file. Re-run crawler.
You'd turn that into code by doing grep -R 404 . or whatever the actual unique error string is and deleting any file containing the error message. (You'd be careful not to run that recursive delete on any unexpected data.)
Really, these problems are pretty easy. It's easy to overthink it.
This isn't 1995 anymore. When you hit a 404 error, you no longer get Apache's default 404 page. You really can't count on there being any consistency between 404 pages on different sites.
If wget somehow stored the header response info to disk (e.g. "FILENAME.header-info") you could whip something up to do what you are suggesting though.
Yeah, wget stores response info to disk. Besides, even if it didn't, you could still visit a 404 page of the website and figure out a unique string of text to search for.
You'd turn that into code by doing grep -R 404 . or whatever the actual unique error string is and deleting any file containing the error message. (You'd be careful not to run that recursive delete on any unexpected data.)
Really, these problems are pretty easy. It's easy to overthink it.