Awk vs. Perl (2009)

troydj · on June 27, 2017

I learned Awk in 1988 before Perl was around (on our systems, anyway). It was super useful at the time. But if you know Perl and Perl is available on your system, there's certainly not a compelling need for writing standalone, multi-line Awk programs. But Awk is really, really useful for one-liners. As Larry Wall has said: "I still say awk '{print $1}' a lot."

Brian Kernighan himself, in this 2015 talk [1] on language design, states that Awk was primarily intended for one-liner usage (he mentions this at 20:43).

[1] https://youtu.be/Sg4U4r_AgJU?t=19m45s

cestith · on June 27, 2017

I don't have it in writing or video, but in 2008 at ACM Reflections/Projections at University of Illinois I was involved in a long conversation with Larry Wall and Al Aho. It was largely about the history and lineage of programming languages.

Al said that if Perl had existed first there wouldn't have been an awk. I pointed out that parts of Perl are inspired by awk and might have otherwise been inspired by SNOBOL or ICON, at which point everyone present seemed to agree means we're thankful for awk. I take it as high praise when Al Aho defers to your tool.

I was just reminiscing with Larry about that discussion last week at The Perl Conference in Arlington. He said he had fond memories of that conversation and that he and Al went for lunch the next day after that conversation, too. I'd have loved to be there for that.

ceronman · on June 27, 2017

Once I learned Perl I never used awk or sed again. Even for one liners with the -n -p -a options you can easily write one liners in Perl that are concise as those in Awk.

raldi · on June 27, 2017

How do you write

    awk '{print $3 ":" $1 " " $2}'

in Perl?

jwilk · on June 27, 2017

  perl -aE 'say "$F[2]:$F[0] $F[1]"'

raldi · on June 27, 2017

That's indeed concise, but it doesn't work. I think you need -naE

jwilk · on June 27, 2017

-a implies -n since v5.19.3.

troydj · on June 27, 2017

One way would be:

   perl -nae'printf("%s:%s %s\n",$F[2],$F[0],$F[1])'

raldi · on June 27, 2017

I would not call that "as concise as awk"

kazinator · on June 27, 2017

Lisp:

  $ txr -e '(awk ((prn `@[f 2]:@[f 0] @[f 1]`)))'
  1 2 3
  3:1 2

How about input from a string stream? At the REPL:

  1> (with-in-string-stream (*stdin* "1 2 3")
       (awk ((prn `@[f 2]:@[f 0] @[f 1]`))))
  3:1 2
  nil

It pays not to have awk be some some canned global behavior enabled by a command line option.

kazinator · on June 28, 2017

Another way, without using a quasiliteral: just set the output field separator (ofs) to empty string, and prn:

  txr -e '(awk (:set ofs "") ((prn [f 2] ":" [f 0] " " [f 1])))'

This is like:

  awk -v OFS= '{print $3, ":", $1, " ", $2}'

ams6110 · on June 27, 2017

I've written reasonably complicated stuff in awk (like a page or two of code). Probably could have solved those problems more elegantly with another tool, but I never learned perl, and I find awk simple enough that the man page is all I need to refresh my memory. For text manipulation that's a one-off or likely to not need further maintenance, I think it's great.

mst · on June 27, 2017

Back at Netcraft for the data crunching pipelines for the surveys we tended to start off with sed+awk for expressivity/concision of operations and then rewrite in perl later for performance.

comstock · on June 27, 2017

The nice thing about awk is that it's really only suitable for simple record processing. You therefore stop using fairly quickly when the problem reaches sufficient complexity to use a better suited language.

It's like a very neat, small domain specific language for use processing simple record based text files. And within those limits it's super useful.

vram22 · on June 27, 2017

>The nice thing about awk is that it's really only suitable for simple record processing.

Not exactly only simple processing, IMO. Record processing, yes - it is a domain specific language, and its core feature is the pattern-action thing, and it was (at least originally) meant mainly to operate on record-oriented text files (even if on lines, since a line is a record with one column). I don't have good examples off the top of my head, but can mention a few things:

The books The Unix Programming Environment by Kernighan and Pike, and either Programming Pearls or More Programming Pearls, by Jon Bentley, have some examples of advanced uses of awk - so it is not just for simple record processing. And I've read somewhere that just shell and awk and other Unix tools have even been used to create a DBMS of sorts. Plus, with later versions like GNU awk and so on, they've added more features to the language, including probably many more built-in functions, and also some network programming support [1].

https://www.quora.com/What-is-the-most-complex-software-writ...

According to an answer at the above link, an nroff-subset text formatter and Lisp subset interpreter have been written in awk. Those are more complex than simple record processing.

Also:

[1] Effective awk Programming, 3rd Edition http://shop.oreilly.com/product/9780596000707.do

comstock · on June 28, 2017

Right, I mean it's Turing complete, and you can do anything you want. But at least when I use it, I quickly reach for another language after using it for basic reformatting/simple calculations because using awk to do so would be too difficult and awkward.

vram22 · on June 29, 2017

I get you now.

znpy · on June 27, 2017

The article doesn't say much, besides replaying many common thoughts. Actually, as an article, it's pretty much useless.

U have been reading "The AWK programming language" and actually kinda fell in love with awk.

Awk does many simple things, it's fairly nice to combine such simple things, and it's generally a nice and handy tool to know.

chubot · on June 27, 2017

Yeah it's basically "I know Perl and I don't know Awk". I'm surprised it got to the front page.

Obviously, the two languages overlap in functionality, and the one you know is going to be easier for the job (for you).

If you know neither, Awk is easier to learn, but that's of course because it does less. That may be good or bad, depending on what you are trying to do.

jff · on June 27, 2017

> The article doesn't say much, besides replaying many common thoughts. Actually, as an article, it's pretty much useless.

Yep, it's totally useless and is really only designed to draw ad traffic (god there's a lot of ads)

microwavecamera · on June 27, 2017

So is Perl, you should try it.

athenot · on June 27, 2017

While I agree with the sentiment of the article, it misses a larger argument. Perl is reasonably good at "making common things easy and hard things possible". Meaning if the needs might vary, it's not a stretch to evolve the code to match the requirements.

And despite having a strong affinity for Perl, I don't think it's productive to just bannish Awk to the trash: part of being well-rounded is knowing many tools—which can overlap to some degree—and appreciating the sweet spot of each one. For some one-liners, Awk is just as (if not more) elegant as Perl.

omaranto · on June 27, 2017

Could you give some examples of one-liners you feel might be more elegant in Awk than in Perl? It might be fun to get a thread going where people try to give clean Perl versions.

majewsky · on June 27, 2017

Basically anything that uses $1, $2,... because you save a split().

  awk '$4 ~ /T/ { print $1 }'
  perl -nE '@x = split /\s+/; say $x[0] if $x[3] =~ /T/;'

... stares at this and wonders ... reads `man perlrun` ... Oooh, there's a switch to auto-split-on-whitespace in Perl.

  perl -anE 'say $F[0] if $F[3] =~ /T/;'

Sooo this doesn't quite answer your question anymore, but I'm going to leave this here since it's a nice instance of TIL.

mitchty · on June 27, 2017

The best part about using the @F array, is you get more than 10 things you can split and dereference. $0-9 only in awk.

That and maybe because i'm an old perl die hard, its just easier to pipe to perl than remember how awk does its shenanigans.

kazinator · on June 27, 2017

Awk is not limited to $0 to $9; where did you get that??? Maybe you're confusing this with register substitutions in regex: \1 through \9.

The nice thing about the @F array is that @F[0] doesn't represent the whole record.

There is a class of Awk bugs whereby arithmetic is being done on the parameters using $N (where N is some calculated variable containing an integer). Everything is fine when N >= 1, but if there is a bug where N is accidentally zero, then $N refers to the whole record instead of throwing an out-of-bounds error.

And things are accidentally zero quite easily in awk, e.g.

   awk '{print $X}'  # X is undefined so serves as zero; this prints all lines

mitchty · on June 27, 2017

Solaris awk. :)

kazinator · on June 28, 2017

I see on my Solaris 10 VM that indeed the "broken old awk" doesn't work past $9.

However, on the same OS, "nawk" does.

mitchty · on June 28, 2017

Yep, there is also /usr/xpg4/bin/awk vs /usr/ucb/bin/awk etc...

I've long since had to deal with Solaris but for about 10 years of Solaris I always treat awk with a lowest common denominator approach. Which means I tend to avoid it even to this day.

kazinator · on June 28, 2017

  > perl -anE 'say $F[0] if $F[3] =~ /T/;'
                                        ^
                                        ^

Please tell me that isn't required. Fail!

omaranto · on June 28, 2017

Don't worry, it's not required.

kazinator · on June 27, 2017

  txr -e '(awk ([#/T/ [f 3]] (prn [f 0])))'

-e is just "evaluate Lisp expression for side effects; don't print its value"

omaranto · on June 28, 2017

Oh, that looks nice. I'm guessing this in TXR Lisp (http://www.nongnu.org/txr/), and that TXR is your project.

kazinator · on June 27, 2017

Awk one-liners don't depend on cryptic combinations of one-letter option flags to change the behavior of something in the language.

jwilk · on June 27, 2017

You can save one character:

-a implicitly sets -n.

banku_brougham · on June 27, 2017

I think awk is great, i enjoy working with awk. The GNU version is indespensible because of multidimensional array support of course. Perhaps I enjoy the way it feels like driving an antique car, it really harkens back to a bygone age. Yet its fast.

I dont want to invest the time in Perl, when there are better general purpose languages out there. Perl certainly has a different look, I can understand why some would love it.

kermatt · on June 27, 2017

I have a strong affinity for Awk. I also have a mild dislike of Perl. Not sure if this is normal or odd.

ShannonAlther · on June 27, 2017

I find Perl extremely useful for making prototypes really quickly, and never learned how to use Awk because I only started with Perl in the first place a year or two ago. I'm sure it's just a matter of preference.

kermatt · on June 27, 2017

Preference and experience.

My time with Perl was wrestling with code that people wrote to show how good they were at Perl, and where maintainability was a secondary concern. This is a bit harder to do in Awk (but not impossible). I moved from Perl to Python in the 1.5 days in part because of this.

I believe this occurs with Java now too. Neither Perl nor Java are bad languages, they are both powerful and performant. However there are some architecture astronauts who seem to enjoy making life harder for the rest of us.

ShannonAlther · on June 27, 2017

+1 for "architecture astronauts", you made my day.

krylon · on June 27, 2017

To me that question never really arose. I learned Perl before I knew the shell well enough to make use of awk or sed. By the time my shell scripts hit the complexity ceiling, I happily switched to Perl and haven't regretted it.

The shell scripts I write are usually just canned commands, maybe some tweaking with environment variables. Anything more complex I usually handle using Perl.

Shell scripts supposedly are more portable because the Bourne shell is always there, but that only is true with different Unix flavours. Once you run into a Windows box, (or more exotic systems like OS/400 (whatever IBM calls it these days), VMS, z/OS, BS2000), shell scripts won't help you. Perl will. ;-)

Last but not least: CPAN. There are perl modules available for nearly anything you could ask for. (Except talking to SharePoint, which I sadly have to do on a fairly regular basis. OTOH, having known its horrors, I can understand why the Perl community wants to stay away from that POS.)

jopython · on June 27, 2017

There is no reason to deal with multiple flavors of Awk when you have Perl, especially when you are working with different operating systems. That and Perl's regex which is the defacto across multiple languages, made me completed do away with Awk.

wodenokoto · on June 27, 2017

Would have been nice with a comparison of typical command line scenarios and perhaps even speed comparison.

okreallywtf · on June 27, 2017

I would also be interested in this.

bangonkeyboard · on June 27, 2017

Awk fits on a manpage and in my head. Perl doesn't.

davidw · on June 27, 2017

FWIW, Ruby has a lot of the same command line arguments that Perl does.

vram22 · on July 4, 2017

Yes, Ruby was inspired quite a bit by Perl. Matz says so somewhere, IIRC. Even it's regex support (I mean the way it is integrated into the language, unlike in Python, not that fact that Ruby supports regex per se), and some language features, are inspired by Perl. Also either or both of Lisp and Smalltalk, I've read.

lloeki · on June 27, 2017

Ruby does not have $_ which you happen not to see in perl one liners because of its implicit nature, and that makes those one liners very clear and terse.

davidw · on June 27, 2017

Ruby has $_

     seq 1 20 | ruby -pe 'puts "num is #{$_}"'

Perl is probably a bit more concise, but you can do a lot of stuff with Ruby, which is convenient if you use it for other things too.

kazinator · on June 28, 2017

The author of this article makes himself appear bat-shit crazy for his claims about Perl syntax being better.

In Awk, you can define a function like this:

  function add(a, m, n,
               sum)
  {
    for (sum = 0; m < n; m++)
      sum += a[m]
    return sum
  }

Named parameters (wow, there is a concept); and few dermatological problems: no damn sigils, or required statement-terminating semicolons. The array reference is a[m], the scalar is m and so on.

Using an extra parameter (which is not specified in the call) for the local variable sum is a massive design fail, but no worse than any of the Perl design fails.

Perl repeats most of the design flaws in Awk, with compounded interest.

CalChris · on June 27, 2017

When I need to do something like this I usually use sed. It's similar enough to vim that I don't have to remember too much. I've never liked perl at all and awk doesn't offer that much more than sed. Maybe that'd be different if this sort of wrangling was a steady diet but it isn't. If I was gonna use awk I'd just skip straight on through to python.

kazinator · on June 27, 2017

> awk doesn't offer that much more than sed

Not much; just, oh, stuff like floating-point math and trig functions; integer math, associative arrays, access to environment variables, named functions with arguments and full recursion; control and looping structures like if, for and while, file I/O and redirection, string literals with C escapes, ...

  $ awk '$0=sin($1) + cos($2)'
  0 1
  0.540302
  1 0
  1.84147

How about sed?

Just kidding ...

CalChris · on June 27, 2017

Holy double angle formula, Batman. Awk has command line trig functions? Riddle me this. Is the Google search bar just an Awk script and they won't tell you? That's just the sneaky kind of trick L+S would try.

majewsky · on June 27, 2017

> awk doesn't offer that much more than sed

The one huge thing that sed is particularly bad at (unless I don't know the particular trick yet) is to split fields on a pattern. I frequently have pipes where sed is used for most of the text-processing, but then cut is used to select one (or multiple) character-separated fields, or awk is used to select one (or multiple) whitespace-separated fields. I only go one step further (usually to Perl) when operations involve multiple lines, e.g. when columns need to be summed up.

vram22 · on July 4, 2017

>awk doesn't offer that much more than sed

Wrote this earlier in this same thread:

https://news.ycombinator.com/item?id=14648534

kazinator · on June 27, 2017

TXR Lisp Awk macro:

http://www.nongnu.org/txr/txr-manpage.html#N-012F3A2C

POSIX standard's Awk examples, translated:

http://www.nongnu.org/txr/txr-manpage.html#N-03D16283

gegtik · on June 27, 2017

I love awk, it's my go-to. Somehow learning perl sufficiently (yet another programming language) is more mental work I'd rather not undertake, what with its idiosyncracies and often implicit nature.

I find awk easy to understand when I read a script after not seeing it for a long time (though I have to google each time I'm writing string manipulation and use arrays) .

forinti · on June 27, 2017

I had to find the longest line in a really big file and tried both awk and Perl. I was surprised at how much faster Perl was:

http://alquerubim.blogspot.com.br/2016/09/a-linha-mais-compr...

Still, awk was simpler.

kazinator · on June 27, 2017

Your benchmark doesn't specify which implementation of Awk you're testing, and which version of it.

Your code is verbose: length($0) shortens to just length:

  awk 'length > max { max = length } END { print max }'

The Perl seems to have an off-by-one error.

forinti · on June 28, 2017

It was an Oracle Linux 6 (64 bits) with GNU Awk 3.1.7 and Perl 5.10.1.

forinti · on June 27, 2017

The Perl version counted the newline char, hence the difference.

marmaduke · on June 27, 2017

At work I grapple with an HPC job scheduler written as a set of concurrent Perl scripts coordinating through transactions in a Postgres database. What a disaster.

Still, it is good fortune that the scheduler was not written in awk.

andrewbinstock · on June 27, 2017

Given that the comments are from 2009, I expect the title is slightly misdated.

type0 · on June 27, 2017

Yeah looks like it, updated it

labster · on June 27, 2017

I can't visit this site on mobile without being redirected to a spam page that looks suspiciously like Facebook.

majewsky · on June 27, 2017

I wondered at first because the text had some gaps in it, and thought that images were not loaded because of my strict uMatrix policy. After enabling the usual suspects (ajax.googleapis.com) didn't work, I inspected the gap and saw that it was just an ad container. And in fact, the same ad service is advertised below the article text.

kwoff · on June 27, 2017

Anyone else learn of Perl in the footnotes of the "sed & awk" book? :)

vram22 · on July 4, 2017

IIRC I first learned of Perl in the Unix Power Tools book from O'Reilly. That book was really good.

freyfogle · on June 27, 2017

why chose, TIMTOWTDI

kazinator · on June 27, 2017

Because at some point you actually have to DI, and then you have to choose a W.

timtoady · on June 27, 2017

can confirm.