Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thank you for adding to the considerable weight of literature completely missing the point of Dr. Knuth's literate software, which is:

tr, sort, uniq, and sed, should all be literate programs.

They would be easier to read, reason about, modify, and extend. At this point, tooling for literate programming lags considerably compared to illiterate programming, and that's entirely because of the determination to miss the point exhibited here.

Too bad really.



As I wrote in other thread - try to read the TeX source

https://mirror-hk.koddos.net/CTAN/systems/knuth/dist/tex/tex...

and compare it with coreutils source code

https://github.com/coreutils/coreutils/blob/master/src/tr.c

what’s easier to read and understand?


There are several orders of magnitude in complexity between tr and TeX, making such a comparison fruitless.

That said, have a gander here: http://brokestream.com/tex.pdf

Not nearly as good as the hardcover, which has a proper table of contents and index.

The key is to imagine 40 years of progress along these lines. I can't imagine our default target would be paper.


The PDF still doesn't help much. The expositionary style of breaking out inner code blocks from their call site harms the ability to understand what's happening. It's nearly impossible to follow in the raw source. Hyperlinks don't improve matters much and the PDF rendering doesn't have rational layout for details like numeric tables.

Try unraveling the numeric code in Metafont:

http://tug.ctan.org/tex-archive/systems/knuth/dist/mf/mf.web

http://www.tug.org/texlive//devsrc/Master/texmf-dist/doc/gen...


The first implements a programming language and typesetting system, while the second just swaps characters. I'm not sure it's a fair comparison. (Additionally, the TeX source is also meant to be formatted, not read raw.)


The TeX source is not in the form in which it is intended to be read. It's like you showing the current HTML and JavaScript source code of some article and complain that the message is hard to read.

What you showed as the TeX source is at the same time a source representation for this book:

Computers & Typesetting, Volume B: TeX: The Program (Reading, Massachusetts: Addison-Wesley, 1986), ISBN 0-201-13437-3

and at the same time the "plain" Pascal program can be extracted from that same source representation(1)

That was the idea of Literate Programming that Knuth also tried to demonstrate in his article as he was "framed."

Which other program that is hard to develop (that one took 10 years of the best programmer in the world, supported by his students and assistants) has a nicely printed book form that fits 600 pages and has all the descriptions?

Even more impressive, Knuth intentionally developed his program with the specific idea that its outputs are the same no matter how much the computers change in the future. And he managed to achieve this -- the more ports of his original program are available everywhere and using his sources from eighties produce exactly same pages as then.

------------

1) "WEB programs are converted to Pascal sources by tangle and to a TeX input file by weave. Of course, tangle and weave are WEB programs as well. So one needs tangle to build tangle---and weave and TeX to read a beautifully typeset WEB program" -- that is, if you don't buy a book which is already typeset and printed.


From my point of view, it is second.

Also, from my point of view, I am MUCH MORE conditioned to read second variant of code.

Third point of mine is that TeX source must be read in PDF form, not in the TeX form. I have the courage to extend that that TeX source code must be manipulated in PDF (read: readable) form, not as the source code per se.


The syntax highlighting helps a lot

> > for (size_t i = 0; i < bytes_read; i++) > buf[i] = xlate[to_uchar (buf[i])];

TIL, tr does not support utf8


From TeX:

> last:=first; {cf.\ Matthew 19\thinspace:\thinspace30}

har har har { eyeroll }


Literate programs are essays. They might be easy to read and reason about, but not modify or extend. A computer program is not best understood and managed as a linear artifact. Much of it's power is in graph nature.

Documentation comments are great, but that's not the same as literate programming.


I don't think a literate program is more or less linear than the source code that is extracted/tangled from it. Both artifacts have a sequence: for a C program, the tangled version would put the #includes before declarations before definitions, for example.

In that sense, the LP program is an alternate linearization of the program, in that the authors can choose the order in which to introduce the program. But few LP programs are naively linear -- they typically impose a tree structure on the code, made up of labelled sections and subsections. Readers don't have to start at line 1 of the program/essay, they can navigate from the table of contents to the section of interest.

A compelling argument for LP is that it's an additive technology. If you don't want to read the essay, that's fine -- just tangle the code, and read the source-code artifact instead. With the right tooling (which admittedly may not exist!) an IDE could let you edit the tangled version directly, and put your edits back into the "essay" at the right places, so round-trip editing would be feasible.


I think I understand part of the problem. Many "literate programs" aren't literate in Knuth's sense. they are merely inversions of the conventional model. Where text is the default and code is the special case that has to be demarcated. Things like literate markdown I've seen which typically read like a regular program with extra text:

  # A Literate Program
  This is a literate program, the language is C. We'll
  begin with the includes because that's what C has at
  the start of every C file, and not because it makes
  any sense for the presentation:
  ```
  #include <stdio.h>
  ...
  ```

  Here are the declarations, you can ignore these for
  now.
  ```
  int main();
  double square(double x);
  ```

  Now that that's out of the way, ...
If that's all most people see then they haven't actually seen the benefit of LP. Where you can push that boilerplate stuff to an appendix so no one has to see it unless they're changing the libraries used by the system or some other thing that's important, but less essential to the understanding that LP tries to promote.


That's an excellent point. I said "most literate programs aren't linear" in my comment... But I wasn't considering the low-effort linear style that many people actually use, so I'm probably wrong on that. :)

"Low-effort linear literate" is a useful style, but I think it falls quite short of what Knuth had in mind.


I refer to this as semiliterate programming.

I've found it to be a useful bootstrap toward a properly literate environment, which will require considerable tooling support to provide a reasonably modern experience.

Happily, we have the Language Server Protocol now, so many of the key components are already in place...


Exactly.

It's not fair to compare 40(!) years of advances in tooling surrounding the pile-of-files approach to software, to the somewhat withered on the vine approach embodied in literate programming.

It's a road not taken, and I think that's a pity, so I'm doing something about it <shrug>


I hope you'll post something about your progress. It's not an easy problem to tackle, but hopefully a fun one. :)


I hope to have a Show HN ready by summer!


> they typically impose a tree structure on the code, made up of labelled sections and subsections.

But most non-trivial programs don't have a tree structure, but are a full directed graph. Methods calling other methods, classes inheriting from other classes or implementing interfaces, etc. Programming and debugging could involve traversing and modifying this graph in almost any order, and does not lend itself to one preferred linearization.


An LP style that somehow reflected the various graphs of a program (control flow, inheritance, etc.) might be very interesting! I'm not sure what it would look like, but it sounds like a starting point for experimentation.

A big program could be broken into modules -- as we already do -- and each module (or its sub-modules) could be documented in a literate style, independently from the other modules. Maybe the graphs of the program (or at least, the graphs of its highly-connected modules) could be presented as alternate trails, indices, tables of contents, etc., each with its own accompanying narrative overview. It sounds gnarly, but not impossible! On the other hand, losing linearity altogether seems to be an anti-goal for an inherently narrative programming style like LP. If you're telling a story (about code, or anything else), eventually you have to put one sentence before the next. At some level of granularity, you have to commit to straight lines.

When he wrote his book, it's clear that Knuth was very much aware that LP was an unusual, and possibly crazy, idea. The book is written in a humble style, like an invitation to explore a design space with him, and not as a prescriptive text. For example, I think the fact that he included McIlroy's full critique in the book speaks to his intentions. I guess my point is that Knuth would probably love that you're challenging his ideas and exploring the design space with him, rather than dismissing LP outright.


> But most non-trivial programs don't have a tree structure,

Pretty much all programs have at least one tree structure that covers the entire program (the AST), though that may or may not be the most interesting view of the program structure.


One such program that I know of and use in conjunction with pandoc is enTangleD[1] which supports editing on both at the cost of comments embedded in the source to keep track of blocks. All that's really required is folding of the comments to get half way to the desired IDE.

[1]: https://entangled.github.io


> Literate programs are essays. They might be easy to read and reason about, but not modify or extend.

What's your rationale behind that last part? Have you ever used any notebook-styled interface like with Mathematica or Jupyter? It's perfectly feasible to tear out a chunk of material and replace it with something else, if you've organized it well. This is no different than the same constraint for conventionally written software. You can't easily refactor shitty code. You can't easily refactor shitty literate code. If you organize your code well and write quality code, whether in the literate or conventional style, you can refactor it with relative ease.

> A computer program is not best understood and managed as a linear artifact. Much of it's power is in graph nature.

And how does that conflict with literate programming? Literate programming permits the reorganization of code into any arbitrary structure. Which means that the graphical nature of the code can be made even more obvious than in most conventional languages. You're no longer bound by single-module files (see Java) or other arbitrary textual constraints. You can place the code in the place that makes the most sense for explication. Or put it adjacent to where it's used, even if it ends up tangled in a different file.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: