OCR generally does not work as you describe. The common case is for the OCR syst...

thaumasiotes · on May 22, 2015

I've read plenty of kindle books that were clearly the product of OCR. True, "cl" hasn't reduced the image of a lowercase d to a single byte, but that was the intention. Don't confuse OCR, the concept, with OCR-as-implemented-in-a-particular-way, or with a-process-that-we-called-OCR-because-OCR-is-involved-at-some-point. OCR is any system that recognizes sections of image data as matching letter shapes[1].

"Generating a font from the image and replacing the original image data with that" is a very good description of what's going on here.

[1] Or numbers, or symbols like parentheses. The basic concept is letters.