He says in the book (right after introducing that example), "if you have trouble with import std;, try the old-fashioned and conventional #include <iostream> ..." And he shows a snippet with that.
To really appreciate the need for awk, imagine writing one-liners and scripts in the late 80s where Perl or Python weren't present. The associative arrays in awk were a game changer. Of course, today there is no need to use awk for multi-line, complex scripts because Python or Perl does the job better (and both languages are more scalable). However, awk is still quite useful for one-liners. But for those developers who never use the one-liner paradigm of pipelines on the command line, this is something they don't realize they're missing.
Brian Kernighan mentions in the book that awk provides "the most bang for the programming buck of any language--one can learn much of it in 5 or 10 minutes, and typical programs are only a few lines long" [p. 116, UNIX: A History and Memoir]. Also keep in mind Larry Wall's (inventor of Perl) famous quote/signature line: "I still say awk '{print $1}' a lot."
A pointer to the study that you're probably thinking of is research done by Paul Nation and Robert Waring (vocabulary researchers). They cite their own 1985 study and a 1989 study with the following quote: "With a vocabulary size of 2,000 words, a learner knows 80% of the words in a text which means that 1 word in every 5 (approximately 2 words in every line) are unknown. Research by Liu Na and Nation (1985) has shown that this ratio of unknown to known words is not sufficient to allow reasonably successful guessing of the meaning of the unknown words. At least 95% coverage is needed for that. Research by Laufer (1989) suggests that 95% coverage is sufficient to allow reasonable comprehension of a text. A larger vocabulary size is clearly better." [1]
Interesting. I haven't actually seen that one. I was actually thinking about a paper from McGill university which I think it was discussing whether or not having annotations which translate certain words (the name of which escapes me at the moment) when free reading is helpful. If you can think of the name for those annotations (the ones that you can often find in English graded readers), then you can probably find the paper (search for that, "free reading" and "McGill").
However, I have no doubt that the paper I was reading, references this one. Great find. Thanks!
> annotations which translate certain words (the name of which escapes me at the moment)
Perhaps you're thinking of glosses? I'm not familiar with graded readers, but I am rather familiar with mediæval manuscripts, wherein it's common to see copies of Latin works with little annotations near certain (sometimes all) words. When the gloss has been added above (or below) the word(s) they annotate, they're said to be interlinear, but marginal glosses aren't unheard of.
The glosses were sometimes written by the same scribe that made the copy, but often they appear to have been added later, perhaps by the owner of the book—sometimes in a comically small hand so as to fit in narrow spaces :)
Yes. That's it. Thanks :-) Unfortunately I still can't find the paper. Oh well, the one provided above is quite good. Probably if one searched for papers that cite that one, it would uncover a lot of interesting new work.
I wonder how this applies to Chinese/Japanese? I can sometimes guess the meaning of words I don't know because I recognize the kanji. Sometimes even if I don't recognize the kanji, but recognize the radicals, I can still guess. I'm not advanced enough for this to happen often, but it does happen.
In Japanese, it is a useful strategy. Knowing the common meanings of all the jouyou kanji is very useful. My main problem when reading that way is that I often don't know all the readings. I still need to look up the word to find out how to pronounce it.
One of the things I'll miss is the "upgrade a registered O'Reilly print book to an ebook for $4.99" option. Always nice to have both the print and [affordable] ebook versions. Bummer.
That and their daily half-off sales have resulted in me spending far too much money on their books... At least they generally have good editing and a pretty high quality bar, better than I've seen with Packt or Apress.
I learned Awk in 1988 before Perl was around (on our systems, anyway). It was super useful at the time. But if you know Perl and Perl is available on your system, there's certainly not a compelling need for writing standalone, multi-line Awk programs. But Awk is really, really useful for one-liners. As Larry Wall has said: "I still say awk '{print $1}' a lot."
Brian Kernighan himself, in this 2015 talk [1] on language design, states that Awk was primarily intended for one-liner usage (he mentions this at 20:43).
I don't have it in writing or video, but in 2008 at ACM Reflections/Projections at University of Illinois I was involved in a long conversation with Larry Wall and Al Aho. It was largely about the history and lineage of programming languages.
Al said that if Perl had existed first there wouldn't have been an awk. I pointed out that parts of Perl are inspired by awk and might have otherwise been inspired by SNOBOL or ICON, at which point everyone present seemed to agree means we're thankful for awk. I take it as high praise when Al Aho defers to your tool.
I was just reminiscing with Larry about that discussion last week at The Perl Conference in Arlington. He said he had fond memories of that conversation and that he and Al went for lunch the next day after that conversation, too. I'd have loved to be there for that.
Once I learned Perl I never used awk or sed again. Even for one liners with the -n -p -a options you can easily write one liners in Perl that are concise as those in Awk.
I've written reasonably complicated stuff in awk (like a page or two of code). Probably could have solved those problems more elegantly with another tool, but I never learned perl, and I find awk simple enough that the man page is all I need to refresh my memory. For text manipulation that's a one-off or likely to not need further maintenance, I think it's great.
Back at Netcraft for the data crunching pipelines for the surveys we tended to start off with sed+awk for expressivity/concision of operations and then rewrite in perl later for performance.
Yes, agreed! You'd think most of the CS people in a room listening to a Lamport lecture, of all things, would've at some point in their education taken a theory of computation or similar course. The notation Lamport was asking about is already taught by page 7 of chapter 0 in Sipser [1].
Linus' "good" version has a McCabe cyclomatic complexity of 2, whereas the "bad" version has a value of 3. So, objectively, one could argue there is improvement there (albeit small). Validation of the "good" version will be easier (e.g. code coverage testing) with fewer paths through the code. Additionally, a lower cyclomatic complexity typically implies less stress on the developer's working memory while reading code (since you don't have to consume an internal brain "register" holding the result of a conditional while following the flow of control).
It seems like the purpose of this book significantly overlaps with Cal Newport's _So Good They Can't Ignore You: Why Skills Trump Passion in the Quest for Work You Love_ [1]. Newport gave corresponding talks on this topic at Google [2] and elsewhere [3], and they cover his book's main ideas.
I wouldn't go so far to say that the Dragon Book is outdated and irrelevant. (I'm assuming you're referring to the 2nd edition from 2006.) Unless you're focusing on back-end optimization and code generation techniques (something a new compiler writer typically does NOT do), the bulk of the theory and material you'd cover in a first semester compiler course is fairly static.
But if a person is merely looking to bang out a compiler without getting overwhelmed with how to convert NFAs to DFAs for lexing, etc., some good alternative books are:
A Retargetable C Compiler: Design and Implementation, by Hanson and Fraser (http://www.amazon.com/Retargetable-Compiler-Design-Implement...). This book constructs and documents the explains the code for a full C compiler with a recursive descent approach (no flex/lex or bison/yacc). I have some experience augmenting this compiler, so I can vouch for the book's ability to clearly convey their design.
Compiler Design in C, by Allen Holub (http://www.holub.com/software/compiler.design.in.c.html). Downloadable PDF at that link as well. A book from 1990 in which Holub constructs his own version of lex and yacc, and then builds a subset-C compiler which generates intermediate code.
It is actually an outdated view, to split a compiler into dedicated monolithic front-end and back-end parts. The more modern approach is very much the opposite. It is a nearly continuous, very long sequence of very simple transforms, rewriting a code seamlessly, all the way down from a front-end (i.e., a parser) to a back-end or multiple back-ends. And this approach is very alien to anything you'd find in the Dragon Book.
As for parsing, as I already said elsewhere in this thread, all the techniques from Dragon Book are not practical any more and are not used in the modern compilers. There are far better ways, which are not covered in the book, and they're far simpler, not even deserving a dedicated book at all.