This is a helpful post because it gets to the heart of the difference. Many people are saying "if you do HTML in a particular way, you get the same benefits." I'm asking "what's inherent to the form?" That's exactly the point about C--you can write it in a way that's provably terminated, but it's not guaranteed. Consider the consumer's perspective.
When I land on a page that's a PDF, I know certain things--I can easily save it and read it later. How do I know that? Not because I have read the PDF spec, or know that much about it, but because of my experience as a consumer of the web.
When I land on an arbitrary web-page, do I know the same thing? No. I don't know what the page is doing, I don't know what my browser will do when I try to save the page. When I save this page, I have the option to save HTML only, or a complete web page. Will the complete page actually work? I go into the source, and there's a link to the javascript (which is saved locally). Does rendering the page rely on that javascript? Does that javascript do xhr or fetch calls? Since it's Hacker News, I suspect the answer is no. However that's not inherent to the medium.
There are better ways to archive the content of even dynamic JS heavy pages, but they are not things that you learn as an average user of the web.
It's possible to write PDFs that don't "work" (for some useful definition of "work" similar to the case with HTML) offline. Please stop pretending that's not true.
The reason offline utility tends to be true more often for PDFs is that PDFs are not generally regarded as the preferred online-default format of choice, which is in turn a matter of social effects rather than technical capacity. Reverse the socially accepted roles of the two document formats and watch the same complaints get made against PDFs as you're making against HTML. I'd bet money the "normal" state of affairs would remain the same in terms of the perceived benefit/detriment allocation between online/offline formats; only which format was considered which would have changed.
. . . but then all the web would be even heavier documents, and even less customizable for local viewing, thanks in part to that pagination and strict formatting situation.
It's possible, but it takes work. I can't remember the last time a pdf did something unreadably weird, usually my only gripe is with something that's a scan of an old document but whoever turned it into PDF didn't do OCR.
I don't really follow. How does this author converting their entire site to PDF help readers/visitors/users?
The original HTML site[0] was printable as PDF, and save-able as both HTML and "Web page, complete", all of which result in a well-formatted & readable offline experience. (It was also responsive: very readable on mobile, but that's an aside).
The new PDF site is not accessible to some, difficult to read on mobile, and interacts poorly with all of the norms web users are accustomed to (back navigation, anchors, etc.)
It's the difference between "this thing has X property" (termination or able to save for offline reading) and "this thing _obviously_ has X property, in a way that you can tell without any expertise, or doing any investigation".
How important this is to users, or whether it is worth it is something I've not commented on, but it is a difference.
When I land on a page that's a PDF, I know certain things--I can easily save it and read it later. How do I know that? Not because I have read the PDF spec, or know that much about it, but because of my experience as a consumer of the web.
When I land on an arbitrary web-page, do I know the same thing? No. I don't know what the page is doing, I don't know what my browser will do when I try to save the page. When I save this page, I have the option to save HTML only, or a complete web page. Will the complete page actually work? I go into the source, and there's a link to the javascript (which is saved locally). Does rendering the page rely on that javascript? Does that javascript do xhr or fetch calls? Since it's Hacker News, I suspect the answer is no. However that's not inherent to the medium.
There are better ways to archive the content of even dynamic JS heavy pages, but they are not things that you learn as an average user of the web.