More

troydj · on Oct 30, 2022

He says in the book (right after introducing that example), "if you have trouble with import std;, try the old-fashioned and conventional #include <iostream> ..." And he shows a snippet with that.

troydj · on Oct 29, 2019

To really appreciate the need for awk, imagine writing one-liners and scripts in the late 80s where Perl or Python weren't present. The associative arrays in awk were a game changer. Of course, today there is no need to use awk for multi-line, complex scripts because Python or Perl does the job better (and both languages are more scalable). However, awk is still quite useful for one-liners. But for those developers who never use the one-liner paradigm of pipelines on the command line, this is something they don't realize they're missing.

Brian Kernighan mentions in the book that awk provides "the most bang for the programming buck of any language--one can learn much of it in 5 or 10 minutes, and typical programs are only a few lines long" [p. 116, UNIX: A History and Memoir]. Also keep in mind Larry Wall's (inventor of Perl) famous quote/signature line: "I still say awk '{print $1}' a lot."

More background on awk from Brian Kernighan in a 2015 talk on language design: https://youtu.be/Sg4U4r_AgJU?t=19m45s

troydj · on Sept 5, 2018

Thank you for sharing your methodology.

A pointer to the study that you're probably thinking of is research done by Paul Nation and Robert Waring (vocabulary researchers). They cite their own 1985 study and a 1989 study with the following quote: "With a vocabulary size of 2,000 words, a learner knows 80% of the words in a text which means that 1 word in every 5 (approximately 2 words in every line) are unknown. Research by Liu Na and Nation (1985) has shown that this ratio of unknown to known words is not sufficient to allow reasonably successful guessing of the meaning of the unknown words. At least 95% coverage is needed for that. Research by Laufer (1989) suggests that 95% coverage is sufficient to allow reasonable comprehension of a text. A larger vocabulary size is clearly better." [1]

1. http://www.fltr.ucl.ac.be/fltr/germ/etan/bibs/vocab/cup.html

mikekchar · on Sept 5, 2018

Interesting. I haven't actually seen that one. I was actually thinking about a paper from McGill university which I think it was discussing whether or not having annotations which translate certain words (the name of which escapes me at the moment) when free reading is helpful. If you can think of the name for those annotations (the ones that you can often find in English graded readers), then you can probably find the paper (search for that, "free reading" and "McGill").

However, I have no doubt that the paper I was reading, references this one. Great find. Thanks!

groovy2shoes · on Sept 6, 2018

> annotations which translate certain words (the name of which escapes me at the moment)

Perhaps you're thinking of glosses? I'm not familiar with graded readers, but I am rather familiar with mediæval manuscripts, wherein it's common to see copies of Latin works with little annotations near certain (sometimes all) words. When the gloss has been added above (or below) the word(s) they annotate, they're said to be interlinear, but marginal glosses aren't unheard of.

The glosses were sometimes written by the same scribe that made the copy, but often they appear to have been added later, perhaps by the owner of the book—sometimes in a comically small hand so as to fit in narrow spaces :)

mikekchar · on Sept 6, 2018

Yes. That's it. Thanks :-) Unfortunately I still can't find the paper. Oh well, the one provided above is quite good. Probably if one searched for papers that cite that one, it would uncover a lot of interesting new work.

dwg · on Sept 6, 2018

I wonder how this applies to Chinese/Japanese? I can sometimes guess the meaning of words I don't know because I recognize the kanji. Sometimes even if I don't recognize the kanji, but recognize the radicals, I can still guess. I'm not advanced enough for this to happen often, but it does happen.

mikekchar · on Sept 8, 2018

In Japanese, it is a useful strategy. Knowing the common meanings of all the jouyou kanji is very useful. My main problem when reading that way is that I often don't know all the readings. I still need to look up the word to find out how to pronounce it.

troydj · on June 29, 2017

Maybe if we all start sending tweets to @timoreilly he will reconsider this very unfortunate decision...

rexreed · on June 29, 2017

From what I understand he hasn't been making day-to-day decisions for O'reilly for at least 10 years. He's just the owner and figurehead.

troydj · on June 29, 2017

One of the things I'll miss is the "upgrade a registered O'Reilly print book to an ebook for $4.99" option. Always nice to have both the print and [affordable] ebook versions. Bummer.

myth_drannon · on June 29, 2017

Manning gives that you for free

_oya8 · on June 29, 2017

We also allow eBook customers to upgrade to a print book for $12 plus shipping - https://www.manning.com/ebooks

douche · on June 29, 2017

That and their daily half-off sales have resulted in me spending far too much money on their books... At least they generally have good editing and a pretty high quality bar, better than I've seen with Packt or Apress.

_oya8 · on June 29, 2017

Glad to hear you've enjoyed our stuff. We really do try to do a good job with it! :)

troydj · on June 27, 2017

I learned Awk in 1988 before Perl was around (on our systems, anyway). It was super useful at the time. But if you know Perl and Perl is available on your system, there's certainly not a compelling need for writing standalone, multi-line Awk programs. But Awk is really, really useful for one-liners. As Larry Wall has said: "I still say awk '{print $1}' a lot."

Brian Kernighan himself, in this 2015 talk [1] on language design, states that Awk was primarily intended for one-liner usage (he mentions this at 20:43).

[1] https://youtu.be/Sg4U4r_AgJU?t=19m45s

cestith · on June 27, 2017

I don't have it in writing or video, but in 2008 at ACM Reflections/Projections at University of Illinois I was involved in a long conversation with Larry Wall and Al Aho. It was largely about the history and lineage of programming languages.

Al said that if Perl had existed first there wouldn't have been an awk. I pointed out that parts of Perl are inspired by awk and might have otherwise been inspired by SNOBOL or ICON, at which point everyone present seemed to agree means we're thankful for awk. I take it as high praise when Al Aho defers to your tool.

I was just reminiscing with Larry about that discussion last week at The Perl Conference in Arlington. He said he had fond memories of that conversation and that he and Al went for lunch the next day after that conversation, too. I'd have loved to be there for that.

ceronman · on June 27, 2017

Once I learned Perl I never used awk or sed again. Even for one liners with the -n -p -a options you can easily write one liners in Perl that are concise as those in Awk.

raldi · on June 27, 2017

How do you write

    awk '{print $3 ":" $1 " " $2}'

in Perl?

jwilk · on June 27, 2017

  perl -aE 'say "$F[2]:$F[0] $F[1]"'

raldi · on June 27, 2017

That's indeed concise, but it doesn't work. I think you need -naE

jwilk · on June 27, 2017

-a implies -n since v5.19.3.

troydj · on June 27, 2017

One way would be:

   perl -nae'printf("%s:%s %s\n",$F[2],$F[0],$F[1])'

raldi · on June 27, 2017

I would not call that "as concise as awk"

kazinator · on June 27, 2017

Lisp:

  $ txr -e '(awk ((prn `@[f 2]:@[f 0] @[f 1]`)))'
  1 2 3
  3:1 2

How about input from a string stream? At the REPL:

  1> (with-in-string-stream (*stdin* "1 2 3")
       (awk ((prn `@[f 2]:@[f 0] @[f 1]`))))
  3:1 2
  nil

It pays not to have awk be some some canned global behavior enabled by a command line option.

kazinator · on June 28, 2017

Another way, without using a quasiliteral: just set the output field separator (ofs) to empty string, and prn:

  txr -e '(awk (:set ofs "") ((prn [f 2] ":" [f 0] " " [f 1])))'

This is like:

  awk -v OFS= '{print $3, ":", $1, " ", $2}'

ams6110 · on June 27, 2017

I've written reasonably complicated stuff in awk (like a page or two of code). Probably could have solved those problems more elegantly with another tool, but I never learned perl, and I find awk simple enough that the man page is all I need to refresh my memory. For text manipulation that's a one-off or likely to not need further maintenance, I think it's great.

mst · on June 27, 2017

Back at Netcraft for the data crunching pipelines for the surveys we tended to start off with sed+awk for expressivity/concision of operations and then rewrite in perl later for performance.

troydj · on Nov 19, 2016

Yes, agreed! You'd think most of the CS people in a room listening to a Lamport lecture, of all things, would've at some point in their education taken a theory of computation or similar course. The notation Lamport was asking about is already taught by page 7 of chapter 0 in Sipser [1].

[1. https://www.amazon.com/Introduction-Theory-Computation-Micha...]

troydj · on Oct 26, 2016

Linus' "good" version has a McCabe cyclomatic complexity of 2, whereas the "bad" version has a value of 3. So, objectively, one could argue there is improvement there (albeit small). Validation of the "good" version will be easier (e.g. code coverage testing) with fewer paths through the code. Additionally, a lower cyclomatic complexity typically implies less stress on the developer's working memory while reading code (since you don't have to consume an internal brain "register" holding the result of a conditional while following the flow of control).

k__ · on Oct 26, 2016

I thought the same.

On the other hand it took me a bit of time to figure out what's going on.

But I'm not C dev, so that could be the issue here.

dwc · on Oct 26, 2016

Serious question: after figuring it out, did you have a somewhat better grasp of linked list mechanics than before?

rocqua · on Oct 26, 2016

I had a better grasp of how to work with them (am not who you replied too).

I never had to build my own linked list though. I never took those basic courses.

troydj · on Sept 10, 2016

It seems like the purpose of this book significantly overlaps with Cal Newport's _So Good They Can't Ignore You: Why Skills Trump Passion in the Quest for Work You Love_ [1]. Newport gave corresponding talks on this topic at Google [2] and elsewhere [3], and they cover his book's main ideas.

[1] https://www.amazon.com/Good-They-Cant-Ignore-You/dp/14555091... [2] https://www.youtube.com/watch?v=qwOdU02SE0w [3] https://www.youtube.com/watch?v=IIMu1PGbG-0

troydj · on Dec 14, 2015

I wouldn't go so far to say that the Dragon Book is outdated and irrelevant. (I'm assuming you're referring to the 2nd edition from 2006.) Unless you're focusing on back-end optimization and code generation techniques (something a new compiler writer typically does NOT do), the bulk of the theory and material you'd cover in a first semester compiler course is fairly static.

But if a person is merely looking to bang out a compiler without getting overwhelmed with how to convert NFAs to DFAs for lexing, etc., some good alternative books are:

A Retargetable C Compiler: Design and Implementation, by Hanson and Fraser (http://www.amazon.com/Retargetable-Compiler-Design-Implement...). This book constructs and documents the explains the code for a full C compiler with a recursive descent approach (no flex/lex or bison/yacc). I have some experience augmenting this compiler, so I can vouch for the book's ability to clearly convey their design.

Compiler Design in C, by Allen Holub (http://www.holub.com/software/compiler.design.in.c.html). Downloadable PDF at that link as well. A book from 1990 in which Holub constructs his own version of lex and yacc, and then builds a subset-C compiler which generates intermediate code.

Practical Compiler Construction, by Nils Holm (http://www.lulu.com/shop/nils-m-holm/practical-compiler-cons...). A recent book which documents the construction of a SubC (subset of C) compiler and generates x86 code on the back end.

sklogic · on Dec 14, 2015

It is actually an outdated view, to split a compiler into dedicated monolithic front-end and back-end parts. The more modern approach is very much the opposite. It is a nearly continuous, very long sequence of very simple transforms, rewriting a code seamlessly, all the way down from a front-end (i.e., a parser) to a back-end or multiple back-ends. And this approach is very alien to anything you'd find in the Dragon Book.

As for parsing, as I already said elsewhere in this thread, all the techniques from Dragon Book are not practical any more and are not used in the modern compilers. There are far better ways, which are not covered in the book, and they're far simpler, not even deserving a dedicated book at all.