Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd urge writers to release in as many formats as possible. Personally, I prefer reading PDFs on my iPad because the typography is often superior to ePub readers.


Hmmm? pop a ePub into iBooks on your iPad and you change font, size etc to your hearts content and it will reflow nicely. Not so with a PDF


I'm reading the Cambridge Grammar of the English Language at the moment. It has a complex layout with varied typography, examples with subscripts and superscripts, text that's left, center, and right aligned, callout boxes, sidebars, diagrams, tables, multiple heading levels, footnotes and so on. ePub simply can't handle complex layouts or typographic requirements beyond those you'd find in a novel. Also, the justification and hyphenation algorithms in ePub readers are typically complete crap.


Yes, I find there are two kinds of books, those that would work as audio books and those that wouldn't. Those that work as audio books also work well as ePub and, in fact, are better in ePub form than as PDF, because I have more control over the presentation.

Books that wouldn't work very well as audio books usually won't work well as ePub either. Charts, maps, complex typography, etc., that would be lost in audio form is mostly lost by ePub. I don't know how much of that loss is due to to ePub format, how much due to ePub viewer apps, how much due to ePub creation tools, and how much due to creators just not trying very hard, but the combination makes most high-quality textbooks much better as PDF, which "beach literature" is better as ePub.


I think ePub's can handle everything you mentioned, ePub's are just XHTML, and a subset of css [1]. Not sure about callout boxes, I don't know exactly what subset of elements are available as I've never written one, but everything else you've mentioned looks to be available [2].

Fun epub file trick: rename the file .zip, unzip -a yourbook.zip (double click unzip doesn't work on osx for me for some reason) and check out the html, css, images and xml of your book.

[1] https://en.wikipedia.org/wiki/EPUB [2] https://www.w3.org/TR/2005/WD-xhtml2-20050527/elements.html


While ePubs can handle more complicated layouts, it's much easier to take an existing print book layout and export to PDF and have it retain the formatting than it is to do the same with ePub. So to export the same layout to ePub, you basically need a web designer to go over it and mark up the parts correctly and write the right CSS. Exporting to PDF is pretty much one step and you're done.


>the justification and hyphenation algorithms in ePub readers are typically complete crap.

IMO, ePub readers have the best justification algorithm: flush left, and the best hyphenation algorithm: never hyphenate. PDFs often force full justification on you, which makes it harder to read because all the line endings have the same spacing, making it easy to lose your position in the text.


I actually doubt that "flush left only" and "no hyphenation" are the only, or even standard, options for most readers.

But I think there's a synthesis possible between what you mean and what OP meant: ePub readers are somewhat better at adjusting to the reader. "Reader" here in terms of both the device and the actual human. Text can reflow, columns are used only where they make sense, and settings can be changed.

PDFs are better at allowing the creator to, well, create. I know there's a hard core demographic that doesn't believe in any sort visual design. Yet there are topics and authors where choices of placement, font, color etc. are made with intent, and to good effect. Everything Tufte comes to mind.

It should be possible, with something like Apple's Book Author, to create such designs that work well within the e-Book format. But I haven't seen any examples, probably because I rarely read textbooks these days and tend to buy paper copies of such books, anyway.

In as far as I have seen illustrations in non-PDF formats, the results are dismal, with all sorts of sizing problems etc.


> making it easy to lose your position in the text

I don't have that problem. I find find ragged-right with no hyphenation less pleasant to read and to look at. When I'm reading a novel on my Kindle, I have it set ragged-right because the Kindle doesn't justify well (just word spacing I think). Same on the web. But when I can get good justification and hyphenation, like on a properly typeset PDF or physical book, I prefer it.


The CSS text-reflow engine (which ePub relies upon) doesn’t spend nearly as much time on the layout of your page as e.g. LaTeX. Thus, the results are vastly subpar to putting the book through a professional rendering engine, outputting to PDF with a page size targeted to the given device.

This isn’t anything to do with PDF, of course; it’s just representing the baked result of a text rendering process. Your device could do that rendering itself just fine; it’d just take a while (much longer than it takes to change font family/font size and see the result in your favourite eBook reader.)

It’d be interesting to see an ePub-like standard that kept the text as semantic text, and rendered “at runtime”, but not “in realtime”; rather using a set of render preferences you set up before-hand to render the book once when you first download/sync it. Sort of like how runtimes like .NET dynamically recompile downloaded bytecode to native code to “specialize” the code for the device they’re running on. With such an approach, you could rely on a render-model that actually spens time doing good, constraint-based rendering of the text to avoid rivers/widows/orphans/excessive hyphens/etc.

You could get pretty far in cutting down the perceived time of this render by just splitting each work into chapters and rendering a chapter at a time. You would be able to start reading once the first chapter finished rendering. Which, for most normal types of books, on the sort of CPU an eBook-reader has, would “only” take ~10 seconds.


There was an interesting conversation about the performance of line breaking algorithms recently. David Fuchs — who worked with Knuth on the line breaking algorithms in Tex — commented that the algorithm in Tex isn't especially performance intensive. It's not included in browsers because high-quality line breaking doesn't play well with floats.

https://news.ycombinator.com/item?id=19785968

https://news.ycombinator.com/item?id=19473277


Thanks for the shout-out, but all the credit for the line-breaking in TeX goes to Knuth and Plass; I was just in the nearby vicinity, and like to make authoritative pronouncements. The linked comments do represent my accurate measurements of running the unadulterated TeX code from Knuth's hand on modern hardware.


I found this great overview of the algorithm which similarly addresses what your comment mentions about it being quadratic is a ‘commonly known fact’ by programmers but is not really true:

https://github.com/jaroslov/knuth-plass-thoughts/blob/master...

It’s always fascinating how these older algorithms and research is still highly relevant and it’s implementation being debated over.


As another datapoint, Knuth-Plass style line breaking is implemented on Android TextView and turned on by default, as of Marshmallow I think. There is a bit of a performance cost, but I personally never came across anybody who turned it off to gain more speed.


Yeah, not really. Papers and scientific books / math books usually are very dependent on layout and PDF is just the better choice here.


That assumes it's just a flow. A lot of the time books have sidebars and other elements that don't just flow into an ePub. Footnotes also don't work as well.


I'd much rather have a PDF. I can open it in Documents (from Readdle), which is super-fast and lets me quickly page through large documents, which is essential for reference-type material or textbooks.

I am also sensitive to bad typography and sadly most ePub readers fail miserably at this. I don't want a "blob of text" with random font sizes.


Some books are not meant to have any changes to fonts, size or reflow. "House of Leaves" is one example of a book formatted specifically for printed books which does not translate well to ePub. (source: I read this book in epub)

When it comes to preserving the intended layout, PDF simply translates better.


Doesn't look so good for programming books, especially for books with source code presented as formatted text and not as images.


I urge them to release the raw TeX so I can compile into a pdf with screen size that matches my kindle.

I actually did this for a paper or two, and turns out long math expressions break things, but otherwise not too bad.

I wouldn't mind small page pdfs also being release as well as the large ones.


Same. Especially with technical books that include illustrations, graphs, etc.


And math equations, still a pain....


agree, especially for technical books with code snippets, tables, etc PDF is the best

I've yet to see a programming book in epub/mobi format that is even reasonably readable


In that spirit, wasn't DocBook designed to be a source language for as many formats as possible? The Wikipedia article[1] says:

> As a semantic language, DocBook enables its users to create document content in a presentation-neutral form that captures the logical structure of the content; that content can then be published in a variety of formats, including HTML, XHTML, EPUB, PDF, man pages, Web help[2] and HTML Help, without requiring users to make any changes to the source. In other words, when a document is written in DocBook format it becomes easily portable into other formats. It solves the problem of reformatting by writing it once using XML tags.

I've never understood how to compile to a target format, though.

[1] https://en.wikipedia.org/wiki/DocBook




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: