Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> fonts that are designed to obfuscate the internal text, e.g. re-arranging characters or splitting glyphs up in strange ways, etc. It's worth noting that with the exception of the spaces, these techniques are used deliberately to stop people extracting text or the copyrighted fonts from the document.

Maybe (and, for the fonts, likely), but I don’t think it’s the only reason. Subsetting embedded fonts makes PDFs smaller, often a lot smaller (why embed an entire font because the document uses a single glyph of it as a bullet point? Why would one include Chinese, Japanese, etc glyphs if the document doesn’t use them?)

Even if it’s possible to do that without changing the code point to glyph mapping (is it? I don’t know enough of fonts to answer that), implementing it may be simpler or result in smaller files if one makes the embedded font dense in code points (I tried finding an answer, but soon remembered how complex fonts are, and gave up)

And of course, modern tools _should_ output accessible PDF documents, which means text extraction _should_ work. I wouldn’t know how well that works in reality, but have my doubts.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: