Scripting with Go (2022)

sgarland · on Aug 20, 2023

Every time I see things like this, I feel like the person must be unaware of awk.

  # the original one-liner to get unique IP addresses
  cut -d' ' -f 1 access.log | sort | uniq -c | sort -rn | head
  # turns into this with GNU awk
  gawk '{PROCINFO["sorted_in"] = "@val_num_desc"; a[$1]++} END {c=0; for (i in a) if (c++ < 10) print a[i], i}' access.log

It's also far, far faster on larger files (base-spec M1 Air):

   $ wc -lc fake_log.txt
   1000000 218433264 fake_log.txt

  $ hyperfine "gawk '{PROCINFO[\"sorted_in\"] = \"@val_num_desc\"; a[\$1]++} END {c=0; for (i in a) if (c++ <10) print a[i], i}' fake_log.txt"
  Benchmark 1: gawk '{PROCINFO["sorted_in"] = "@val_num_desc"; a[$1]++} END {c=0; for (i in a) if (c++ <10) print a[i], i}' fake_log.txt
  Time (mean ± σ):      1.250 s ±  0.003 s    [User: 1.185 s, System: 0.061 s]
  Range (min … max):    1.246 s …  1.254 s    10 runs

  $ hyperfine "cut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn | head"
  Benchmark 1: cut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn | head
  Time (mean ± σ):      4.844 s ±  0.020 s    [User: 5.367 s, System: 0.087 s]
  Range (min … max):    4.817 s …  4.873 s    10 runs

Interestingly, GNU cut is significantly faster than BSD cut on the M1:

  $ hyperfine "gcut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn | head"
  Benchmark 1: gcut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn | head
  Time (mean ± σ):      3.622 s ±  0.004 s    [User: 4.149 s, System: 0.078 s]
  Range (min … max):    3.616 s …  3.629 s    10 runs

jeffbee · on Aug 20, 2023

The overwhelming cost of the first shell pipeline, at least on my machine, is caused by the default UTF-8 locale. As I have found in almost every other case, `LC_ALL=C` radically speeds this up.

  Original: 3.294s
  w/ LC_ALL=C: 1.055s
  w/ larger sort buffer `-S5%`: 0.780s
  Your gawk: 1.772s
  + LC_ALL=C: 1.772s

By the way, these changes immediately suggested themselves after running the pipeline under `perf`. Profiling is always the first step in optimization.

sgarland · on Aug 20, 2023

Collation aside (which is absolutely a huge boost in speed that I neglected to think about), I assumed that the rest of the difference was coming from the fact that the initial `cut` meant the rest of the pipeline had far less to deal with, whereas `awk` is processing every line. Benchmarking (and testing in `perf`) showed this to not be the case. I'd need to compile `awk` with debug symbols, I think, to know exactly where the slowdown is, but I'm going to assume it's mostly due to `sort` being extremely optimized for doing one thing, and doing it well.

I did find one other interesting difference between BSD and GNU tools - BSD sort defaults to 90% for its buffer, GNU sort defaults to 1024 KiB.

Combining all of these (and using GNU uniq - it was also faster), I was able to get down to 463 msec on the M1 Air:

  $ hyperfine "export LC_ALL=C; gcut -d' ' -f1 fake_log.txt | gsort -S5% | guniq -c | gsort -rn -S5% | head"
  Benchmark 1: export LC_ALL=C; gcut -d' ' -f1 fake_log.txt | gsort -S5% | guniq -c | gsort -rn -S5% | head
  Time (mean ± σ):     463.4 ms ±   3.3 ms    [User: 965.5 ms, System: 93.3 ms]
  Range (min … max):   459.9 ms … 469.8 ms    10 runs

TIL, thank you.

xvector · on Aug 20, 2023

Could you elaborate on how you arrived at 5% for your buffer? Does specifying a buffer size really cause that much of a speed up?

sgarland · on Aug 20, 2023

Parent poster used the 5% value, so I did as well. But yes, to answer your question. Shown here on a Debian 11 system that's fairly old:

  $ hyperfine "export LC_ALL=C; cut -d' ' -f1 fake_log.txt | sort -S5% | uniq -c | sort -rn -S5% | head"
  Benchmark 1: export LC_ALL=C; cut -d' ' -f1 fake_log.txt | sort -S5% | uniq -c | sort -rn -S5% | head
  Time (mean ± σ):      1.504 s ±  0.318 s    [User: 2.833 s, System: 0.474 s]
  Range (min … max):    0.942 s …  1.937 s    10 runs
 
  $ hyperfine "export LC_ALL=C; cut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn | head"
  Benchmark 1: export LC_ALL=C; cut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn | head
  Time (mean ± σ):      3.847 s ±  0.093 s    [User: 4.165 s, System: 0.613 s]
  Range (min … max):    3.591 s …  3.919 s    10 runs

Setting the buffer value to ~half the file size (100 MB) resulted in a mean time of 2.291 seconds. Setting it to the size of the file resulted in a mean time of 1.549 seconds, which is close enough to the 5% to call it equal - not like this server isn't busy with other stuff, so it's hardly a good place for perfect benchmarking.

Syscalls aren't free, nor are disk reads, so if you have the RAM to support slurping the entire file at once, it's sometimes faster.

zer8k · on Aug 20, 2023

I don't understand the downvotes. This is a fair criticism. The author even points out "programs as pipelines" which is literally the UNIX philosophy. There are tools that already exist on UNIX-likes more people should use instead of reaching for a script.

I can sympathize with the author w.r.t wanting to use a single language you like for everything. However, after decades I've found this to be untenable. There are languages that are just simply better for one-off scripting (Perl, Python), and languages that aren't (anything compiled). Trying to bolt an interpreter onto a compiled language from the outside seems like a lot of work for questionable gain.

ajross · on Aug 20, 2023

> There are languages that are just simply better for one-off scripting (Perl, Python), and languages that aren't (anything compiled). Trying to bolt an interpreter onto a compiled language from the outside seems like a lot of work for questionable gain.

One reason is deployment. Writing code in python/node/etc... implies the ability of the production environment to bootstrap a rather complicated installation tree for the elaborate runtimes required by the code and all its dependencies. And so there are elaborate tools (npm, venv, Docker, etc...) that have grown up around those requirements.

Compiled languages (and Go in particular shines here) spit out a near-dependency-free[1] binary you can drop on the target without fuss.

I deal with this in my day job pretty routinely. Chromebooks have an old python and limited ability to pull down dependencies for quick test runs. Static test binaries make things a lot easier.

[1] Though there are shared libraries and runtime frameworks there too. You can't deploy a Gnome 3 app with the same freedom you can a TCP daemon, obviously.

nijave · on Aug 21, 2023

> Compiled languages (and Go in particular shines here) spit out a near-dependency-free[1] binary you can drop on the target without fuss

I think it's more accurate to say "static binaries" instead of "compiled languages." The same headaches exist with dynamically linked, compiled binaries (and sometimes they're worse since you don't have a dependency manager, unless you add one)

> Static test binaries make things a lot easier I think this really depends on your target environment and how much control you have over it. If you're in a Ruby or Python shop, for example, all your servers already have the stack installed. If you're targeting end user devices, those can have a huge mess of different configs to account for

ajross · on Aug 21, 2023

Meh. In my experience version churn with shared library dependencies is pretty minor. You have to worry about and work at it if you're doing stuff like deploying a single binary across a bunch of different linux distros. But the straightforward case of "build on your desktop and copy the file up to the host" is a routine thing you can expect to work.

It's nothing like that with Python or Node. The rule there is that you get something working locally and then spend a while reverse engineering a pip/venv recipe or manifest or whatever to make it work somewhere else. It's decidedly non-trivial.

LeBit · on Aug 20, 2023

I agree with you.

But I think for python you could also deploy a binary with pyinstaller.

karmakaze · on Aug 20, 2023

The 'scripting' vs 'compiled' language is a false dichotomy. Awk, Perl, Python are compiled programs. What makes a 'scripting' language special? Dynamic typing? Lack of compile step/delay?

I could imagine a lifetime of collecting scripting macros/libs in lisp to be as good or better.

tempusr · on Aug 20, 2023

Python is not a compiled language.

However, the reason Bash is so prolific amongst Sys Admins such as myself is the fact that they are portable and reliable to use across Debian, Arch or RHEL based distributions.

You don't have to import extra libraries, ensure that you are running the proper python environment, or be certain that pip is properly installed and configured for whatever extra source code beyond what is included out of the box.

Bash is the most consistent code you can write to perform any task you need when you have to work with Linux.

dragonwriter · on Aug 20, 2023

> Python is not a compiled language.

Python is (at least in the CPython implementation) compiled, to python byte code which runs on the python virtual machine.

Its not compiled to native code. (Unless you use one of the compilers which do compile it to native code, though they tend to support only a subset of python.)

LeBit · on Aug 20, 2023

Bash is fine for small scripts.

Once you use it to manage complex data structures and flow, you are simply wasting time because you will have to rewrite it in Python or Go.

pdimitar · on Aug 20, 2023

Another commenter beat me to it but still: sh / bash / zsh are quite fine up until certain complexity (say 500 lines), after which adding even a single small feature becomes a huge drag. We're talking hours for something that would take me 10 minutes in Golang and 15 in Rust.

sgarland · on Aug 20, 2023

I can actually agree with this take. Most of the opinions I've seen in this vein take some absurdly small limit, like 5 lines. 500, though? Yeah. My team rewrote a ProxySQL handler in Python because the bash version had gotten out of hand, and there were only a handful of people who could understand what it was doing. It passed 100% of spellcheck tests, and was as modular as it could possibly be, but modifying it was still an exercise in pain.

ruraljuror · on Aug 21, 2023

> It passed 100% of spellcheck tests

Just like your post does! :)

sgarland · on Aug 21, 2023

Ugh… Safari being overly helpful.

mbreese · on Aug 20, 2023

> portable and reliable to use across Debian, Arch or RHEL based distributions

Until you try to use a newer feature or try the script in a Mac or BSD or any older bash.

SH code is completely portable, but bash itself can have quite a few novel features. Don’t get me wrong - I’m happy the language is dynamic and still growing. But it can make things awkward when trying to use a script from a newer system on an older server (and the author has been “clever”).

rascul · on Aug 21, 2023

> SH code is completely portable

Not exactly.

https://stackoverflow.com/questions/11376975/is-there-a-mini...

DanielHB · on Aug 21, 2023

Also the bash runtime is quite small, I think <2mb uncompressed so it is pretty much included in every distro

Installing python or nodejs into your distro will inflate it by at least 30mb, quite crucial when dealing with containers

sigjuice · on Aug 21, 2023

The phrase "compiled language" doesn't mean anything. Python compilers exist.

heresie-dabord · on Aug 20, 2023

> The 'scripting' vs 'compiled' language is a false dichotomy.

Not false, but perhaps in need of better definition. The term script has often denoted a trivial set of commands run by $interpreter.

"Scripting languages" have been seen as being in contrast to C, C++, Pascal, Java, SmallTalk, &c. The scripting languages remove from the user the need:

-a- to think about an extensive type system,

-b- to compile the logic, and

-c- to build for a specific architecture.

fuzztester · on Aug 21, 2023

>> The 'scripting' vs 'compiled' language is a false dichotomy.

https://news.ycombinator.com/item?id=37206428

paulddraper · on Aug 20, 2023

Static typing is the key differentiator.

That requires a level of bookkeeping which is helpful for large programs and a nuicense for small programs.

karmakaze · on Aug 21, 2023

Closer to the truth is that static typing is a nuisance to a sole dev working in a short temporal period. Many successful startups get stuck with overgrown 'scripts' as platforms because it started as a one-man-programming shop.

I do have to add that Python more than any other language I've used results in working the first try that it's not surprising.

riku_iki · on Aug 20, 2023

> Dynamic typing?

actually amount of reasoning, which program requires to perform in run time is close to interpretion.

sgarland · on Aug 20, 2023

> The author even points out "programs as pipelines" which is literally the UNIX philosophy.

Yes, and if the thing I'm trying to do has a small input, it will only be done once, etc. I will often just pipe `grep` to `sort` or whatever, because it's less typing, it's generally clearer to a wider range of people, etc.

But on larger inputs, or even things like doing a single pattern inversion mixed with a pattern match, I like awk.

tomcam · on Aug 20, 2023

One reason the author could be doing this is to reduce dependencies. Maybe they deploy to Windows or to some other environment not guaranteed to have those utilities. Also testing probably gets simplified.

ajross · on Aug 20, 2023

And every time I see things like that, I feel like the person must be unaware of perl.

I've made this point before, but I still find it hilarious. For more than a decade, awk was dead. Like, dead dead. There was nothing you could do in awk that wasn't cleaner and simpler and vastly more extensible in perl. And, yes, perl was faster than gawk, just like gawk is faster than shell pipelines.

Then python got big, people decided that they didn't want to use perl for big projects[1], and so perl went out of vogue and got dropped even for the stuff it did (and continues to do) really well. Then a new generation came along having never learned perl, and...

... have apparently rediscovered awk?

[1] Also the perl 5 tree stagnated[2] as all the stakeholders wandered off into the weeds to think about some new language. They're all still out there, AFAIK.

[2] Around 2000-2005, perl was The Language to be seen writing your new stuff in, so e.g. bioinformatics landed there and not elsewhere. But by 2015, the TensorFlow people wouldn't be caught dead writing perl.

voidfunc · on Aug 20, 2023

Perl never recovered from its many ways to do things label. It's a tired criticism of the language but it's lodged in the brains of a generation of programmers which is unfortunate.

Also the classic sysadmin role which used to lean on Perl heavily sort of evolved with rise of The Cloud and automation tools like Chef, Puppet, and Ansible took over in that 2005-2015 time frame.

pclmulqdq · on Aug 20, 2023

I am in the "awk > perl" camp. I think the idea of "vastly more extensible" is a negative for my scripting language, and "cleaner" just doesn't matter - I just want to write it the one time I want to use it and then be done with it. The awk language is really simple and quick to write.

By the way, I think this is why Perl lost to Python on larger scripting and programming projects - it's just easier to write (albeit harder to read, to antagoinze the Python lovers out there).

tgv · on Aug 20, 2023

I learned perl around that time, and I thought it was awful. And just about everything about it: the parameter passing, the sigils that made BASIC look like Dijkstra's love child, the funky array/scalar coercion, and the bloody fact that it couldn't read from two files at once even though the docs suggested it should work. They didn't say so explicitly, because perl was pretty badly documented. My boss started writing object oriented perl, and that made perl unreadable even to perl experts.

AWK, on the other hand, is simplicity itself. Sure, it misses a few things, but for searching through log files or db dumps it's an excellent tool. And it's fast enough. If you really need much more speed, there are other tools, but I would rather rewrite it in C than try perl again.

tptacek · on Aug 20, 2023

They taught awk to my boy in bioinformatics as part of his degree. I was like Vito Corleone in the funeral home when he showed me the FASTA parsing awk code they were working on.

telotortium · on Aug 20, 2023

I mostly use awk over perl because awk is completely documented in one man page, so it's easy to see whether awk will be fit for purpose or whether I should write it using a real programming language. I learned Perl over a decade ago, but not the really concise dialect you would use on the command line for stuff I'd use awk for, and I've forgotten almost all of it now. At least with awk it's easy to relearn the functions I need when I need it.

ajross · on Aug 20, 2023

Right, which is sort of my point. 20 years ago, "everyone" knew perl, at least to the extent of knowing the standard idioms for different environments that you're talking about. And in that world, "everyone" would choose perl for these tasks, knowing that everyone else would be expert enough to read and maintain them. Perl was the natural choice.

And in a world where perl is a natural choice for these tasks, awk doesn't have a niche. Because at the end of the day awk is simply an inferior language.

Which is the bit I find funny: we threw out and forgot about a great tool, and now we think that the ancestral toy it replaced is a good idea again.

sgarland · on Aug 20, 2023

That's a fair criticism. I know Perl can do pretty amazing things with text, but I've never bothered to learn it.

EDIT: I decided to ask GPT-4 to translate the gawk script to Perl. I make zero claims that this is ideal (as stated, I don't know Perl at all), but it _does_ produce the same output, but slightly slower than the gawk script.

  $ hyperfine "perl -lane '\$ips{\$F[0]}++; END {print \"\$ips{\$_} \$_\" for (sort {\$ips{\$b} <=> \$ips{\$a}} keys %ips)[0..9]}' fake_log.txt"
  Benchmark 1: perl -lane '$ips{$F[0]}++; END {print "$ips{$_} $_" for (sort {$ips{$b} <=> $ips{$a}} keys %ips)[0..9]}' fake_log.txt
  Time (mean ± σ):      1.499 s ±  0.006 s    [User: 1.447 s, System: 0.050 s]
  Range (min … max):    1.490 s …  1.507 s    10 runs

ajross · on Aug 20, 2023

I would have gone with an iteratively-built list, FWIW, and avoided the overhead in parsing fields the script won't use:

    perl -e 'for $i (<>) { $i =~ s/ .*//; push @list, $i; }; print(sort(@list));'

tomjakubowski · on Aug 20, 2023

Sample of one. I came of age on Linux in the late 90s/early 00s. Through other nerds on IRC channels I became familiar with Perl and didn't like it. I also picked up basic awk in the context of one-liners for shell pipelines and it was pretty nice for that. Easier to remember than the flags for cut and friends.

Learning awk a bit more deeply in recent years has been good too. I can write one liners that do more. I shipped a full awk script once, for something unimportant, but I would never do that again. For serious text munging these days I'd rather write a Rust program.

xdsdvsv · on Aug 20, 2023

way to completely miss the point and turn this into a weird pissing competition (btw your "simple" awk example is super complicated and opaque to someone who doesn't have the awk man page open in front of them)

The script package looks really cool and I'll definitely try it out, cause honestly even though I do a lot of bash scripting it's super painful for anything but something super simple.

sgarland · on Aug 20, 2023

If someone doesn't know awk, then of course it'll be complicated and opaque - the same is true of practically any language. One-liners in general also tend to optimize for space. If you wanted it to be pretty-printed and with variable names that are more obvious:

  {
    PROCINFO["sorted_in"] = "@val_num_desc"
    top_ips[$1]++
  }
  END {
    counter = 0
    for (i in top_ips) {
      if (counter++ < 10) {
        print top_ips[i], i
      }
    }
  }

But also, if you read further up in the thread, you'll see that another user correctly identified the bottlenecks in the original pipeline, and applying those optimizations made it about 3x as fast as the awk one. Arguably, if you weren't familiar with the tools (and their specific implementations, like how GNU sort and BSD sort have wildly different default buffer sizes), you'd still be facing the same problem.

At least half of what people complain about with shell scripts can be solved by using ShellCheck [0], and understanding what it's asking you to do. I disagree with the common opinion of "anything beyond a few lines should be a Python script instead." If you're careful with variable scoping and error handling, bash is perfectly functional for many uses.

[0]: https://www.shellcheck.net

dharmab · on Aug 20, 2023

> If someone doesn't know awk, then of course it'll be complicated and opaque - the same is true of practically any language

I don't think this is true. Before I learned Go, I could follow along most Go programs pretty well, and learning Go well enough to get started took less than an hour. Every attempt I've made to learn more Awk, I've bounced off.

jppittma · on Aug 21, 2023

Really? I learned awk by watching a one hour youtube video one afternoon. It being a DSL really makes it super easy to learn, and this, to me suggests you probably haven't given it much time.

fsociety · on Aug 21, 2023

Good for you, their point still holds up. Languages like Python and Go are more readable than awk and bash. They are designed to be that way, and many many years of effort have been put into them for that specific purpose.

Whereas if you know awk and bash, then they can be incredibly useful in a pinch. It doesn’t knock how powerful they are. I think it is worth learning. But if something needs to be maintained then there is an argument for Python/Go/whatever.

subjectsigma · on Aug 20, 2023

> If you're careful with variable scoping and error handling, bash is perfectly functional for many uses.

“Loaded guns are perfectly functional for juggling, just be careful with the trigger and you won’t shoot yourself in the foot!”

You are technically correct but why bother with being careful when you could just avoid writing bash?

sgarland · on Aug 20, 2023

Because it's really fast to iterate on if you know it, it's available basically everywhere and has no external dependencies, and you don't have to compile it.

kermatt · on Aug 20, 2023

Mawk can be even faster, although missing some features of GNU Awk 5.

tejtm · on Aug 20, 2023

It is always "horses for courses" and there may be times when the five concurrent cores with the shell pipeline will beat the single core awk script.

jerf · on Aug 20, 2023

I don't do a lot of shell scripting type things in Go because it's not a great language for it, but when I do, I take another approach, which is just to panic. Generics offer a nice little

    func Must[T any](x T, err error) T {
        if err != nil {
            panic(err)
        }
        return x
    }

which you can wrap around any standard "x, err :=" function to just make it panic, and even prior to generics you could wrap a "PanicOnErr(justReturnsErr())".

In the event that you want to handle errors in some other manner, you trivially can, and you're not limited to just the pipeline design patterns, which are cool in some ways, but limiting when that's all you have. (It can also be tricky to ensure the pipeline is written in a way that doesn't generate a ton of memory traffic with intermediate arrays; I haven't checked to see what the library they show does.) Presumably if I'm writing this in Go I have some other reason for wanting to do that, like having some non-trivial concurrency desire (using concurrency to handle a newline-delimited JSON file was my major use case, doing non-trivial though not terribly extensive work on the JSON).

While this may make some people freak, IMHO the real point of "errors as values" is not to force you to handle the errors in some very particular manner, but to make you think about the errors more deeply than a conventional exceptions-based program typically does. As such, it is perfectly legal and moral to think about your error handling and decide that what you really want is the entire program to terminate on the first error. Obviously this is not the correct solution for my API server blasting out tens of thousands of highly heterogeneous calls per second, but for a shell script it is quite often the correct answer. As something I have thought about and chosen deliberately, it's fine.

simonw · on Aug 20, 2023

If you're not familiar with Go there is one detail missing from this post (though it's in the script README) - what a complete program looks like. Here's the example from https://github.com/bitfield/script#a-realistic-use-case

    package main

    import (
        "github.com/bitfield/script"
    )

    func main() {
        script.Stdin().Column(1).Freq().First(10).Stdout()
    }

jvictor118 · on Aug 20, 2023

If one were actually going to use something like this, I’d think it’d be worth implementing a little shebang script that can wrap a single-file script in the necessary boilerplate and call go run!

simonw · on Aug 20, 2023

That's a really fun idea. I got that working here: https://til.simonwillison.net/bash/go-script

Now you can run this:

    cat file.txt | ./goscript.sh -c 'script.Stdin().Column(1).Freq().First(10).Stdout()'

Or write scripts like this - call it 'top10.sh':

    #!/tmp/goscript.sh
    script.Stdin().Column(1).Freq().First(10).Stdout()

Then run this:

    chmod 755 topten.sh
    echo "one\none\ntwo" | ./topten.sh

moondev · on Aug 21, 2023

This is fantastic. Thanks for sharing and also including docs on your process and usage!

jayd16 · on Aug 20, 2023

Hmm, I wonder if this is Microsoft's real endgame with allowing the single line C# syntax.

alexk307 · on Aug 20, 2023

The whole point of using Go is to explicitly handle errors as they happen. All of these steps can fail, but it’s not clear how they fail and if the next steps should proceed or be skipped on previous failures. This is harder to reason about, debug, and write than grep and bash.

mst · on Aug 20, 2023

It defaults to not running the rest if a step fails, and the error result is accessible via usual mechanisms.

  _, err := script.Foo(...).Bar(...).Stdout();
  if err != nil {
    log.Fatal(err)
  }

is sufficient for a quick scripting hack designed to be run interactively.

I don't see it as a lot different to bash scripts with -e and pipefail set, which is generally preferable anyway.

Plenty of go code does

  if err != nil {
    return nil, err;
  }

for each step and there are plenty of cases where you only care -if- it failed plus a description of some sort of the failure - if you want to proceed on some errors you'd split the pipe up so that it pauses at moments where you can check that and compensate accordingly.

(and under -e plus pipefail, "error reported to stdout followed by aborting" is pretty much what you get in bash as well, so I'm unconvinced it's actually going to be harder to debug)

coldtea · on Aug 20, 2023

>The whole point of using Go is to explicitly handle errors as they happen

That's hardly the whole point of using Go.

The friendlier syntax (and in this case DSL) is an ever bigger point.

In any case, you can trivially get at the error at the point it occured:

n, err := script.File("test.txt").Match("Error").CountLines()

simonw · on Aug 20, 2023

I believe error handling looks like this:

    package main

    import (
        "github.com/bitfield/script"
    )

    func main() {
         _, err := script.Stdin().Column(1).Freq().First(10).Stdout()
        if err != nil {
            log.Fatal(err)
        }
    }

Errors are "remembered" by the pipeline and can be processed when you get to a sink method.

_0w8t · on Aug 20, 2023

From a technical point of view nothing prevents the scripting package to be just as informative with errors as bash and have a helper to log and clear the error. If it is not already the case, I call it a bug.

simonw · on Aug 20, 2023

Inspired by comments in this thread, I threw together a Bash script that lets you do this:

    cat file.txt | ./goscript.sh -c 'script.Stdin().Column(1).Freq().First(10).Stdout()'

You can also use it as a shebang line to write self-contained scripts.

Details here: https://til.simonwillison.net/bash/go-script

booleandilemma · on Aug 20, 2023

I like Go, but its insistence on not permitting unused imports and unused variables make it unsuitable for scripting, imo.

For scripting I want something that I can be fast and messy in. Go is the opposite of that.

It's ok, a language doesn't have to be good at everything.

nprateem · on Aug 20, 2023

There should totally be a compiler flag to not require those

saghm · on Aug 20, 2023

Maybe there could be a whole class of things that are like errors, but not as severe, and flags that deal with them as a group! We could call them "warnings"

tacticus · on Aug 21, 2023

Oh you mean things that are ignored in every situation ever

saghm · on Aug 25, 2023

If you choose to do that with warnings, that's your prerogative. I pretty much take the opposite approach personally; before putting code into review (or merging, if it's a personal project with no reviewers), every warning is inspected and I decide whether to fix it by changing the code not to generate it or manually suppress it only in that specific location. 99% of the time the warning is a sign of something actually wrong, but the 1% where I know that I actually don't care (and the extremely frequent occurrences when I don't while in the development part of the cycle and not finished with the implementation), it's much better to not have it completely block me.

nix-zarathustra · on Aug 21, 2023

I agree. When I am scripting, I want to have quick feed back loop. You can't really do that with Go because it doesn't have as good introspection and debugging capabilities as a scripting language like Ruby and doesn't have exceptions, which means that error handling is more verbose than necessary.

Also, I like being able to make modifications on the fly, so doing something in Ruby, I can just open the file make adjustments and I am done. With Go, I have to compile it and move it back into my path which is really tedious.

perfmode · on Aug 20, 2023

From Sanjay Ghemawat, 9 years ago

https://github.com/ghemawat/stream

pdimitar · on Aug 20, 2023

Shell scripting is quite fine up until certain complexity (say 500-1000 lines), after which adding even a single small feature becomes a huge drag. We're talking hours for something that would take me 10 minutes in Golang and 15 in Rust.

Many people love to smirk and say "just learn bash properly, duh" but that's missing the point that we never do big projects in bash so our muscle memory of bash is always kind of shallow. And by "we" I mean "a lot of programmers"; I am not stupid, but I have to learn bash's intricacies every time almost from scratch and that's not productive. It's very normal for things to slip up from your memory when you're not using them regularly. To make this even more annoying, nobody will pay me to work exclusively with bash for 3 months until it gets etched deep into my memory. So there's that too.

I view OP as a good reminder that maybe universal-ish tools to get most of what we need from shell scripting exist even today but we aren't giving them enough attention and energy and we don't make them mainstream. Though it doesn't help that Golang doesn't automatically fetch dependencies when you just do `go run random_script.go`: https://github.com/golang/go/issues/36513

I am not fixating on Golang in particular. But IMO next_bash_or_something should be due Soon™. It's not a huge problem to install a single program when provisioning a new VM or container either so I am not sure why are people so averse to it.

So yeah, nice article. I like the direction.

EDIT: I know about nushell, oilshell and fish but admittedly never gave them a chance.

Hendrikto · on Aug 20, 2023

This is satire, right? I think commenters are completely missing the point.

https://en.m.wikipedia.org/wiki/A_Modest_Proposal

dang · on Aug 20, 2023

The submitted title was "Scripting with Go: A Modest Proposal" but the phrase "modest proposal" doesn't appear in the article, so I've taken it out.

"Please use the original title, unless it is misleading or linkbait; don't editorialize." - https://news.ycombinator.com/newsguidelines.html

wudangmonk · on Aug 20, 2023

The unix philosophy of having small programs that take in input, process it and return a result has proven to a success, I just never understood why the next logical step of having this program in library form never became a thing. I guess shells are a bit useful but not as useful as a decent repl (common-lisp or the jupyter repl) where these programs can be used as if they were a function.

nerdbaggy · on Aug 20, 2023

I ended up using this for my cli scripting needs. https://github.com/google/zx

Alifatisk · on Aug 21, 2023

Why is the Google project advertising Webpod?

js2 · on Aug 20, 2023

Previous discussion (March 11, 2022 | 243 points | 66 comments):

https://news.ycombinator.com/item?id=30641883

geenat · on Aug 20, 2023

Would love to use more golang- amazing build system and cross compiler built in. "All in one" binaries are the best thing ever. I adore most of the ideas in the language.

.... but there are just soooo many little annoyances / inconveniences which turn me off.

- No Optional Parameters. No Named Parameters. Throw us a bone Rob Pike, it's 2023. Type inferred composite literals may be an OK compromise.. if we ever see them: https://github.com/golang/go/issues/12854

- Unused import = will not compile. Unused variable = Will not compile. Give us the ability to turn off the warning.

- No null safe or nullish coalescing operator. (? in rust, ?? in php, etc.)

- Verbosity of if err != nil { return err; }

- A ternary operator would be nice, and could bring if err != nil to 1 line.

- No double declarations. “no new variables on left side of :=” .. For some odd reason “err” is OK here... Would be highly convenient for pipelines, so each result doesn't need to be uniquely named.

I'd describe Go as a "simple" language- Not an "easy" language. 1-2 lines in Python is going to be 5-10 lines in golang.

Note: Nim has most of these..

myzie · on Aug 20, 2023

Agreed!

Shameless plug: this is why I built Risor.

https://github.com/risor-io/risor

Keep in the Go ecosystem, retain compatibility with the Go programs you already have, but have a much more concise scripting capability at your disposal.

geenat · on Aug 20, 2023

Looks more useful than OP.

_0w8t · on Aug 20, 2023

The error handling verbosity in Go should be blamed partially on the formatter that replaces one-liner if err != nil { return err } with 3 lines.

IshKebab · on Aug 20, 2023

I agree. Go has such amazing infrastructure it's a huge shame the language is so stubbornly basic.

skybrian · on Aug 20, 2023

Have you tried Deno?

geenat · on Aug 20, 2023

skybrian · on Aug 20, 2023

https://deno.land/

This is Typescript, but you have language complaints, and it will build binaries.

ilyt · on Aug 20, 2023

Perl was literally made for that, just use it

Alifatisk · on Aug 21, 2023

Or Ruby, but it requies Ruby to be installed so I guess Crystal is a better alternative?

sigzero · on Aug 21, 2023

I totally agree with you on that.

1vuio0pswjnm7 · on Aug 20, 2023

      export LC_ALL=C
      awk '!a[$1]++' access.log|head

If access.log is large enough, awk will fail.

When this happens, one can split access.log into pieces, process separately then recombine.

But that's more or less what sort(1) does with large files, creating temporary files in $TMPDIR or other user-specified directory after -T if using GNU sort.

There was a way to eliminate duplicate lines from an unordered list using k/q, without using temporary files but I stopped using it after Kx, Inc. was sold off and I started using musl exclusively. q requires glibc.

For example, something like

     #!/bin/sh
     # usage: $0 file
     echo "k).Q.fs[l:0::\`:$1];l:?:l;\`:$1 0:l"|exec q >null;

Can this be done in ngn k.

The other approach I use to avoid temporary files is to just put the list in an SQL database, add a UNIQUE constraint, and update the database.

ramranch · on Aug 20, 2023

Out of curiosity, what's your reasoning for using musl exclusively

fsmv · on Aug 20, 2023

I put together a go "sh-bang" line so you can just chmod +x your .go file and run it (and it works with go fmt unlike other options).

    /*usr/bin/env go run "$0" "$@"; exit $? #*/

It's fun try it out! Just make this the first line of the file.

DanielHB · on Aug 21, 2023

I have been thinking that JS template literals could be a great replacement for shell programming, allowing you to make more powerful syntax to emulate a lot of bash useful things while still having a lot of a proper programming language power

for example:

  import { jsh, cat, grep, PipeOutput } from 'jsh; 
  // type PipeOutput = { stdout: ReadableStream, toString: () => Promise<string>, extra: Record<string,any> }

  function countLines(input: PipeOutput, argv: string[]): PipeOutput {
    // ...
  }
  const textToLookFor = process.argv[1]
  const output: PipeOutput = jsh`${cat} file.txt | ${grep} ${textToLookFor} | 
  ${countLines}`
  console.log(output.toString())

tambourine_man · on Aug 20, 2023

The pipe like code with dot notation reminds me a lot of jQuery. That’s a compliment.

myzie · on Aug 20, 2023

Take a look at Risor and its pipes capability.

https://github.com/risor-io/risor#quick-example

Stay in the Go ecosystem, but gain pipes, Python-like f-strings, and more.

(I'm the author)

KnobbleMcKnees · on Aug 20, 2023

I agree with this, and also in a complimentary way, but it all seems very non-idiomatic for Go. But I am not a Go expert by any means.

coldtea · on Aug 20, 2023

Try jq if you haven't already.

tambourine_man · on Aug 20, 2023

Yes, but jq’s syntax is impossible to memorize for me.

gron | rg

FTW

nullwarp · on Aug 20, 2023

Oh very neat, thanks for posting I will definitely give this a try.

ComputerGuru · on Aug 20, 2023

Tangentially related: I posted a shebang for scripting in rust some years ago, if anyone is interested: https://neosmart.net/blog/self-compiling-rust-code/

earthboundkid · on Aug 20, 2023

This post is several years old fwiw.

everybodyknows · on Aug 20, 2023

There's a cute little icon telling us "Feb 21" right at the top but omitting the year, which would have been ever so helpful.

tyingq · on Aug 21, 2023

It's wrapped with a nice semantic tag with the actual date:

  <time class="dt-published date-callout" datetime="2022-02-21" pubdate="">
    <div class="date-wrapper">
      <span class="month">Feb </span><span class="day">21</span>
    </div>
  </time>

Not sure if that's correct or not, though.

kardianos · on Aug 20, 2023

Interesting. I do something similar with my task https://github.com/kardianos/task package, which is in tern loosely based off of another package from 10-15 years ago.

tgv · on Aug 20, 2023

That sounds interesting, but the package is unfortunately undocumented. I tried https://pkg.go.dev/github.com/kardianos/task, but that doesn't help me understand it either. It's missing a high level explanation of what to use it for, its limits and some decent examples.

fuzztester · on Aug 21, 2023

Related: Ousterhout's dichotomy:

https://en.m.wikipedia.org/wiki/Ousterhout%27s_dichotomy

38 · on Aug 21, 2023

> cut -d' ' -f 1 access.log |sort |uniq -c |sort -rn |head

FYI Go has Compact now:

https://godocs.io/slices#Compact

dang · on Aug 20, 2023

Discussed at the time:

Scripting with Go - https://news.ycombinator.com/item?id=30641883 - March 2022 (66 comments)

nicechianti · on Aug 20, 2023

terrible idea