Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Expanding TeX's \Newif (2021) (mht.wtf)
45 points by martinhath on May 2, 2023 | hide | past | favorite | 5 comments


Nice post. Blog posts on TeX's internals are rare; see also Graham Douglas's posts on his blog (http://www.readytext.co.uk/?cat=14) and on Overleaf: https://www.overleaf.com/learn/latex/Articles/A_New_Series_o...

Once I had the thought of writing a "TeX debugger", that would separately show the function of TeX's "eyes", "mouth", "stomach" (actual terms used by Knuth) etc. It would have a bunch of different pages, one for the characters of the input file(s) (the input stack), one for the token (stream) that will be seen next, one for the "commands", one for the lists (horizontal lists, vlists, mlists) being assembled currently, one for the paragraph-breaking, one for the page-breaking, etc.

So if you were curious what was going on or encountered an error, you would be able to peer into exactly what part your error is coming from. Then I realized that with a typical macro-heavy usage of TeX (like LaTeX or even plain), almost all the action of interest to typical users would be in the expansion section.

I had hacked up an initial prototype (posted here: https://tex.stackexchange.com/a/391131) that shows what goes on with the following TeX input (without spaces):

    \expandafter \expandafter \expandafter \meaning \expandafter \uppercase \expandafter{a}
    \end
It may be interesting to exhume it and see what it does with the `\newif` from this post. The way I was doing it at the time is pretty slow (running TeX under gdb and exporting prodigious amounts of output), and the output format is pretty unhelpful/confusing, but these days I think it may make sense to compile TeX to WASM and have everything running in the browser. I haven't touched it in 6 years, but if anyone is interested in working on such a TeX debugger I can share ideas. :-)

My motivations were twofold:

- Most TeX users don't really understand what's going on, and this mismatch in mental models leads to much confusion, overly complex macros, etc.

- There are a lot of great ideas in TeX, which newer typesetting systems may ignore and not use. One problem is that TeX is too monolithic and I'm hoping that if people could see its parts separately, some of these ideas may be easier to grasp.


My port of the New Typesetting System (NTS), called KeenType, whittles down the Java-based implementation to a few of Knuth's files. Namely, "plain.tex" and "hyphen.tex":

https://github.com/DaveJarvis/KeenType/tree/main/tex/src/mai...

Getting familiar with the fonts required understanding the difference between font metrics (TFM files) and the fonts themselves. To make matters a little less straightforward, Knuth created a special character mapping for indexes into the fonts. It was not easy to find a font that mapped those glyphs exactly. The closest font was BaKoMa:

https://github.com/DaveJarvis/KeenType/tree/main/tex/src/mai...

This required hard-coding a mapping between Knuth's code points and the actual code points in the target font:

https://github.com/DaveJarvis/KeenType/blob/989dbe26f68eda75...

Plain TeX has a lot of moving parts, so it's great to see more people writing about the topic.


I thought I'd seen it all when I learnt how you could implement a conditional with function calls in lambda calculus (the point being you don't want to evaluate the branch you're not taking).

But TeX, of course, manages to take that to new heights.


I've always used \iffalse as a way to comment out text in TeX without losing syntax highlighting, and never thought much about its main use. It's cool to see that it's a crucial ingredient in the definition of \newif (also this SE answer [0] goes in more detail; tldr the basic \if is bad at parsing nested \fi's).

[0] https://tex.stackexchange.com/questions/46377/what-is-iffals...


This is great. I've crossed paths with TeX three times professionally, and a bit more with LaTeX, currently with DBLaTeX. It's the best open source typesetting that I know of, with XSL far behind, Prawn close after that, and the newer web-print engines unfortunately in last place - at least for the moment.

Getting writers who are willing to write in it is another story. Also - something finally related to the OP - the core TeX markup is missing functionality that's been high fashion in Doculand for the last few decades. Conditionals and Transclusion, aka the core functionalities of Component Content Systems.

Fifteen years go, the TeX world is where I first encountered technologists who flagged both of those as bad ideas. This was before I had put my finger on what exactly was causing CCSs to fail; back then I just chalked it up to bad vendors or my own incompetence. The TeX graybeard's take on it was, "there's no way that the work you're saving is going to be worth the complexity incurred by allowing everyone to use includes everywhere". I was honestly a little exasperated with what sounded like an overly general blanket statement, and eventually I found Asciidoc as an XML replacement. Asciidoc supports conditionals and transclusion, thus can confidently be labelled a CCS. I felt like I had split the atom: open source, includes, conditionals, transclusion, XML interoperability, good print, alright tool support.

This Asciidoc CCS also ultimately failed, becoming unmaintainable after a mere two years of producing documentation for a rapidly changing product line chock full of pie in the sky widgets.

In other words, it failed, like every other CCS I have worked on, seen, or heard about for almost twenty years. SGML, XML, DocBook, S1000D, DITA, Flare, Frame, Epic, you name it, doesn't matter what tools you bring to the table - same result. CCS systems require technical communicators to know more about the product than is possible at the first stage of the product lifecycle. You know, when documentation is most needed. The conditions and transclusion you set up at the beginning of a product's life will almost certainly be wrong, and far more difficult to refactor than a unified document, so it doesn't happen on a rev cycle, and now you have a gigantic mess. You'd have been better off in Word. CCS is, for the vast majority of applications, a trap.

Yet CCS remains the obsession of the technical communication space, like the Information Problem still is for old school socialists. Indeed, outside of the Soviet system, I can't think of a time industry has focused so single-mindedly on such an inherently unachievable - even absurd - goal. Why is that?

There's no limit to the amount of money you can make selling bosses products that promise to shrink their headcounts, and that's the first part of this answer. Developing document systems is further settled with separate funds for improvements - capital and systems and such - so the money spent detangling a broken CCS is money that doesn't come out of the tech pubs boss' budget. He sees a net gain, even if his writers' life is hell on earth.

But this isn't the most important reason.

The most important reason is Product Maturity Theatre. When you roll a new product with a CCS documentation system backing it, it gives the appearance of a product that has its components and variants so fully architected, so frictionless and interchangeable, that the document components might as well be made of naked gymnasts in an oil wrestling contest. It's all a show, of course. That CCS system is built from noisy, horrifying source, and no one works in it. It'll never get updated, ever[1]. It's a Potemkin Village. But look how well architected it is.

Now think about what kind of company would be so pre-occupied with the sloppiness of their own product that they're willing to fully employ twenty people to build an insanely detailed cardboard facade just to convince their prospects otherwise. Seriously: think about that company: you've just imagined the worst possible user of a component content system. And that's the most likely person to try and buy one.

[1] I just checked an IETM image that I know was pushed in 2013 to see if it's still there. Yup. Issue 000. So wait, what are the techs in the field using to do their jobs? Word files originally ripped from the IETM HTML, hacked up by engineers and clandestinely redistributed via email with no version control, no system, nothing. For a frickin' decade. That IETM is a piece of 20 million dollar taxidermy.

[1][1] Note to the note - they're building a WHOLE NEW IETM SYSTEM this year. <MILITARY BRANCH> absolutely banning engineering docs on site - specifically word docs, which is interesting. "What, we already had IETMs?". Yeah, they didn't work. Want to know why? "Nope!" Eh, alright. What's another forty million between friends.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: