Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Scripting with Go (2022) (bitfieldconsulting.com)
150 points by gus_leonel on Aug 20, 2023 | hide | past | favorite | 105 comments


Every time I see things like this, I feel like the person must be unaware of awk.

  # the original one-liner to get unique IP addresses
  cut -d' ' -f 1 access.log | sort | uniq -c | sort -rn | head
  # turns into this with GNU awk
  gawk '{PROCINFO["sorted_in"] = "@val_num_desc"; a[$1]++} END {c=0; for (i in a) if (c++ < 10) print a[i], i}' access.log
It's also far, far faster on larger files (base-spec M1 Air):

   $ wc -lc fake_log.txt
   1000000 218433264 fake_log.txt

  $ hyperfine "gawk '{PROCINFO[\"sorted_in\"] = \"@val_num_desc\"; a[\$1]++} END {c=0; for (i in a) if (c++ <10) print a[i], i}' fake_log.txt"
  Benchmark 1: gawk '{PROCINFO["sorted_in"] = "@val_num_desc"; a[$1]++} END {c=0; for (i in a) if (c++ <10) print a[i], i}' fake_log.txt
  Time (mean ± σ):      1.250 s ±  0.003 s    [User: 1.185 s, System: 0.061 s]
  Range (min … max):    1.246 s …  1.254 s    10 runs

  $ hyperfine "cut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn | head"
  Benchmark 1: cut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn | head
  Time (mean ± σ):      4.844 s ±  0.020 s    [User: 5.367 s, System: 0.087 s]
  Range (min … max):    4.817 s …  4.873 s    10 runs
Interestingly, GNU cut is significantly faster than BSD cut on the M1:

  $ hyperfine "gcut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn | head"
  Benchmark 1: gcut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn | head
  Time (mean ± σ):      3.622 s ±  0.004 s    [User: 4.149 s, System: 0.078 s]
  Range (min … max):    3.616 s …  3.629 s    10 runs


The overwhelming cost of the first shell pipeline, at least on my machine, is caused by the default UTF-8 locale. As I have found in almost every other case, `LC_ALL=C` radically speeds this up.

  Original: 3.294s
  w/ LC_ALL=C: 1.055s
  w/ larger sort buffer `-S5%`: 0.780s
  Your gawk: 1.772s
  + LC_ALL=C: 1.772s
By the way, these changes immediately suggested themselves after running the pipeline under `perf`. Profiling is always the first step in optimization.


Collation aside (which is absolutely a huge boost in speed that I neglected to think about), I assumed that the rest of the difference was coming from the fact that the initial `cut` meant the rest of the pipeline had far less to deal with, whereas `awk` is processing every line. Benchmarking (and testing in `perf`) showed this to not be the case. I'd need to compile `awk` with debug symbols, I think, to know exactly where the slowdown is, but I'm going to assume it's mostly due to `sort` being extremely optimized for doing one thing, and doing it well.

I did find one other interesting difference between BSD and GNU tools - BSD sort defaults to 90% for its buffer, GNU sort defaults to 1024 KiB.

Combining all of these (and using GNU uniq - it was also faster), I was able to get down to 463 msec on the M1 Air:

  $ hyperfine "export LC_ALL=C; gcut -d' ' -f1 fake_log.txt | gsort -S5% | guniq -c | gsort -rn -S5% | head"
  Benchmark 1: export LC_ALL=C; gcut -d' ' -f1 fake_log.txt | gsort -S5% | guniq -c | gsort -rn -S5% | head
  Time (mean ± σ):     463.4 ms ±   3.3 ms    [User: 965.5 ms, System: 93.3 ms]
  Range (min … max):   459.9 ms … 469.8 ms    10 runs
TIL, thank you.


Could you elaborate on how you arrived at 5% for your buffer? Does specifying a buffer size really cause that much of a speed up?


Parent poster used the 5% value, so I did as well. But yes, to answer your question. Shown here on a Debian 11 system that's fairly old:

  $ hyperfine "export LC_ALL=C; cut -d' ' -f1 fake_log.txt | sort -S5% | uniq -c | sort -rn -S5% | head"
  Benchmark 1: export LC_ALL=C; cut -d' ' -f1 fake_log.txt | sort -S5% | uniq -c | sort -rn -S5% | head
  Time (mean ± σ):      1.504 s ±  0.318 s    [User: 2.833 s, System: 0.474 s]
  Range (min … max):    0.942 s …  1.937 s    10 runs
 
  $ hyperfine "export LC_ALL=C; cut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn | head"
  Benchmark 1: export LC_ALL=C; cut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn | head
  Time (mean ± σ):      3.847 s ±  0.093 s    [User: 4.165 s, System: 0.613 s]
  Range (min … max):    3.591 s …  3.919 s    10 runs
Setting the buffer value to ~half the file size (100 MB) resulted in a mean time of 2.291 seconds. Setting it to the size of the file resulted in a mean time of 1.549 seconds, which is close enough to the 5% to call it equal - not like this server isn't busy with other stuff, so it's hardly a good place for perfect benchmarking.

Syscalls aren't free, nor are disk reads, so if you have the RAM to support slurping the entire file at once, it's sometimes faster.


I don't understand the downvotes. This is a fair criticism. The author even points out "programs as pipelines" which is literally the UNIX philosophy. There are tools that already exist on UNIX-likes more people should use instead of reaching for a script.

I can sympathize with the author w.r.t wanting to use a single language you like for everything. However, after decades I've found this to be untenable. There are languages that are just simply better for one-off scripting (Perl, Python), and languages that aren't (anything compiled). Trying to bolt an interpreter onto a compiled language from the outside seems like a lot of work for questionable gain.


> There are languages that are just simply better for one-off scripting (Perl, Python), and languages that aren't (anything compiled). Trying to bolt an interpreter onto a compiled language from the outside seems like a lot of work for questionable gain.

One reason is deployment. Writing code in python/node/etc... implies the ability of the production environment to bootstrap a rather complicated installation tree for the elaborate runtimes required by the code and all its dependencies. And so there are elaborate tools (npm, venv, Docker, etc...) that have grown up around those requirements.

Compiled languages (and Go in particular shines here) spit out a near-dependency-free[1] binary you can drop on the target without fuss.

I deal with this in my day job pretty routinely. Chromebooks have an old python and limited ability to pull down dependencies for quick test runs. Static test binaries make things a lot easier.

[1] Though there are shared libraries and runtime frameworks there too. You can't deploy a Gnome 3 app with the same freedom you can a TCP daemon, obviously.


> Compiled languages (and Go in particular shines here) spit out a near-dependency-free[1] binary you can drop on the target without fuss

I think it's more accurate to say "static binaries" instead of "compiled languages." The same headaches exist with dynamically linked, compiled binaries (and sometimes they're worse since you don't have a dependency manager, unless you add one)

> Static test binaries make things a lot easier I think this really depends on your target environment and how much control you have over it. If you're in a Ruby or Python shop, for example, all your servers already have the stack installed. If you're targeting end user devices, those can have a huge mess of different configs to account for


Meh. In my experience version churn with shared library dependencies is pretty minor. You have to worry about and work at it if you're doing stuff like deploying a single binary across a bunch of different linux distros. But the straightforward case of "build on your desktop and copy the file up to the host" is a routine thing you can expect to work.

It's nothing like that with Python or Node. The rule there is that you get something working locally and then spend a while reverse engineering a pip/venv recipe or manifest or whatever to make it work somewhere else. It's decidedly non-trivial.


I agree with you.

But I think for python you could also deploy a binary with pyinstaller.


The 'scripting' vs 'compiled' language is a false dichotomy. Awk, Perl, Python are compiled programs. What makes a 'scripting' language special? Dynamic typing? Lack of compile step/delay?

I could imagine a lifetime of collecting scripting macros/libs in lisp to be as good or better.


Python is not a compiled language.

However, the reason Bash is so prolific amongst Sys Admins such as myself is the fact that they are portable and reliable to use across Debian, Arch or RHEL based distributions.

You don't have to import extra libraries, ensure that you are running the proper python environment, or be certain that pip is properly installed and configured for whatever extra source code beyond what is included out of the box.

Bash is the most consistent code you can write to perform any task you need when you have to work with Linux.


> Python is not a compiled language.

Python is (at least in the CPython implementation) compiled, to python byte code which runs on the python virtual machine.

Its not compiled to native code. (Unless you use one of the compilers which do compile it to native code, though they tend to support only a subset of python.)


Bash is fine for small scripts.

Once you use it to manage complex data structures and flow, you are simply wasting time because you will have to rewrite it in Python or Go.


Another commenter beat me to it but still: sh / bash / zsh are quite fine up until certain complexity (say 500 lines), after which adding even a single small feature becomes a huge drag. We're talking hours for something that would take me 10 minutes in Golang and 15 in Rust.


I can actually agree with this take. Most of the opinions I've seen in this vein take some absurdly small limit, like 5 lines. 500, though? Yeah. My team rewrote a ProxySQL handler in Python because the bash version had gotten out of hand, and there were only a handful of people who could understand what it was doing. It passed 100% of spellcheck tests, and was as modular as it could possibly be, but modifying it was still an exercise in pain.


> It passed 100% of spellcheck tests

Just like your post does! :)


Ugh… Safari being overly helpful.


> portable and reliable to use across Debian, Arch or RHEL based distributions

Until you try to use a newer feature or try the script in a Mac or BSD or any older bash.

SH code is completely portable, but bash itself can have quite a few novel features. Don’t get me wrong - I’m happy the language is dynamic and still growing. But it can make things awkward when trying to use a script from a newer system on an older server (and the author has been “clever”).


> SH code is completely portable

Not exactly.

https://stackoverflow.com/questions/11376975/is-there-a-mini...


Also the bash runtime is quite small, I think <2mb uncompressed so it is pretty much included in every distro

Installing python or nodejs into your distro will inflate it by at least 30mb, quite crucial when dealing with containers


The phrase "compiled language" doesn't mean anything. Python compilers exist.


> The 'scripting' vs 'compiled' language is a false dichotomy.

Not false, but perhaps in need of better definition. The term script has often denoted a trivial set of commands run by $interpreter.

"Scripting languages" have been seen as being in contrast to C, C++, Pascal, Java, SmallTalk, &c. The scripting languages remove from the user the need:

-a- to think about an extensive type system,

-b- to compile the logic, and

-c- to build for a specific architecture.


>> The 'scripting' vs 'compiled' language is a false dichotomy.

https://news.ycombinator.com/item?id=37206428


Static typing is the key differentiator.

That requires a level of bookkeeping which is helpful for large programs and a nuicense for small programs.


Closer to the truth is that static typing is a nuisance to a sole dev working in a short temporal period. Many successful startups get stuck with overgrown 'scripts' as platforms because it started as a one-man-programming shop.

I do have to add that Python more than any other language I've used results in working the first try that it's not surprising.


> Dynamic typing?

actually amount of reasoning, which program requires to perform in run time is close to interpretion.


> The author even points out "programs as pipelines" which is literally the UNIX philosophy.

Yes, and if the thing I'm trying to do has a small input, it will only be done once, etc. I will often just pipe `grep` to `sort` or whatever, because it's less typing, it's generally clearer to a wider range of people, etc.

But on larger inputs, or even things like doing a single pattern inversion mixed with a pattern match, I like awk.


One reason the author could be doing this is to reduce dependencies. Maybe they deploy to Windows or to some other environment not guaranteed to have those utilities. Also testing probably gets simplified.


And every time I see things like that, I feel like the person must be unaware of perl.

I've made this point before, but I still find it hilarious. For more than a decade, awk was dead. Like, dead dead. There was nothing you could do in awk that wasn't cleaner and simpler and vastly more extensible in perl. And, yes, perl was faster than gawk, just like gawk is faster than shell pipelines.

Then python got big, people decided that they didn't want to use perl for big projects[1], and so perl went out of vogue and got dropped even for the stuff it did (and continues to do) really well. Then a new generation came along having never learned perl, and...

... have apparently rediscovered awk?

[1] Also the perl 5 tree stagnated[2] as all the stakeholders wandered off into the weeds to think about some new language. They're all still out there, AFAIK.

[2] Around 2000-2005, perl was The Language to be seen writing your new stuff in, so e.g. bioinformatics landed there and not elsewhere. But by 2015, the TensorFlow people wouldn't be caught dead writing perl.


Perl never recovered from its many ways to do things label. It's a tired criticism of the language but it's lodged in the brains of a generation of programmers which is unfortunate.

Also the classic sysadmin role which used to lean on Perl heavily sort of evolved with rise of The Cloud and automation tools like Chef, Puppet, and Ansible took over in that 2005-2015 time frame.


I am in the "awk > perl" camp. I think the idea of "vastly more extensible" is a negative for my scripting language, and "cleaner" just doesn't matter - I just want to write it the one time I want to use it and then be done with it. The awk language is really simple and quick to write.

By the way, I think this is why Perl lost to Python on larger scripting and programming projects - it's just easier to write (albeit harder to read, to antagoinze the Python lovers out there).


I learned perl around that time, and I thought it was awful. And just about everything about it: the parameter passing, the sigils that made BASIC look like Dijkstra's love child, the funky array/scalar coercion, and the bloody fact that it couldn't read from two files at once even though the docs suggested it should work. They didn't say so explicitly, because perl was pretty badly documented. My boss started writing object oriented perl, and that made perl unreadable even to perl experts.

AWK, on the other hand, is simplicity itself. Sure, it misses a few things, but for searching through log files or db dumps it's an excellent tool. And it's fast enough. If you really need much more speed, there are other tools, but I would rather rewrite it in C than try perl again.


They taught awk to my boy in bioinformatics as part of his degree. I was like Vito Corleone in the funeral home when he showed me the FASTA parsing awk code they were working on.


I mostly use awk over perl because awk is completely documented in one man page, so it's easy to see whether awk will be fit for purpose or whether I should write it using a real programming language. I learned Perl over a decade ago, but not the really concise dialect you would use on the command line for stuff I'd use awk for, and I've forgotten almost all of it now. At least with awk it's easy to relearn the functions I need when I need it.


Right, which is sort of my point. 20 years ago, "everyone" knew perl, at least to the extent of knowing the standard idioms for different environments that you're talking about. And in that world, "everyone" would choose perl for these tasks, knowing that everyone else would be expert enough to read and maintain them. Perl was the natural choice.

And in a world where perl is a natural choice for these tasks, awk doesn't have a niche. Because at the end of the day awk is simply an inferior language.

Which is the bit I find funny: we threw out and forgot about a great tool, and now we think that the ancestral toy it replaced is a good idea again.


That's a fair criticism. I know Perl can do pretty amazing things with text, but I've never bothered to learn it.

EDIT: I decided to ask GPT-4 to translate the gawk script to Perl. I make zero claims that this is ideal (as stated, I don't know Perl at all), but it _does_ produce the same output, but slightly slower than the gawk script.

  $ hyperfine "perl -lane '\$ips{\$F[0]}++; END {print \"\$ips{\$_} \$_\" for (sort {\$ips{\$b} <=> \$ips{\$a}} keys %ips)[0..9]}' fake_log.txt"
  Benchmark 1: perl -lane '$ips{$F[0]}++; END {print "$ips{$_} $_" for (sort {$ips{$b} <=> $ips{$a}} keys %ips)[0..9]}' fake_log.txt
  Time (mean ± σ):      1.499 s ±  0.006 s    [User: 1.447 s, System: 0.050 s]
  Range (min … max):    1.490 s …  1.507 s    10 runs


I would have gone with an iteratively-built list, FWIW, and avoided the overhead in parsing fields the script won't use:

    perl -e 'for $i (<>) { $i =~ s/ .*//; push @list, $i; }; print(sort(@list));'


Sample of one. I came of age on Linux in the late 90s/early 00s. Through other nerds on IRC channels I became familiar with Perl and didn't like it. I also picked up basic awk in the context of one-liners for shell pipelines and it was pretty nice for that. Easier to remember than the flags for cut and friends.

Learning awk a bit more deeply in recent years has been good too. I can write one liners that do more. I shipped a full awk script once, for something unimportant, but I would never do that again. For serious text munging these days I'd rather write a Rust program.


way to completely miss the point and turn this into a weird pissing competition (btw your "simple" awk example is super complicated and opaque to someone who doesn't have the awk man page open in front of them)

The script package looks really cool and I'll definitely try it out, cause honestly even though I do a lot of bash scripting it's super painful for anything but something super simple.


If someone doesn't know awk, then of course it'll be complicated and opaque - the same is true of practically any language. One-liners in general also tend to optimize for space. If you wanted it to be pretty-printed and with variable names that are more obvious:

  {
    PROCINFO["sorted_in"] = "@val_num_desc"
    top_ips[$1]++
  }
  END {
    counter = 0
    for (i in top_ips) {
      if (counter++ < 10) {
        print top_ips[i], i
      }
    }
  }
But also, if you read further up in the thread, you'll see that another user correctly identified the bottlenecks in the original pipeline, and applying those optimizations made it about 3x as fast as the awk one. Arguably, if you weren't familiar with the tools (and their specific implementations, like how GNU sort and BSD sort have wildly different default buffer sizes), you'd still be facing the same problem.

At least half of what people complain about with shell scripts can be solved by using ShellCheck [0], and understanding what it's asking you to do. I disagree with the common opinion of "anything beyond a few lines should be a Python script instead." If you're careful with variable scoping and error handling, bash is perfectly functional for many uses.

[0]: https://www.shellcheck.net


> If someone doesn't know awk, then of course it'll be complicated and opaque - the same is true of practically any language

I don't think this is true. Before I learned Go, I could follow along most Go programs pretty well, and learning Go well enough to get started took less than an hour. Every attempt I've made to learn more Awk, I've bounced off.


Really? I learned awk by watching a one hour youtube video one afternoon. It being a DSL really makes it super easy to learn, and this, to me suggests you probably haven't given it much time.


Good for you, their point still holds up. Languages like Python and Go are more readable than awk and bash. They are designed to be that way, and many many years of effort have been put into them for that specific purpose.

Whereas if you know awk and bash, then they can be incredibly useful in a pinch. It doesn’t knock how powerful they are. I think it is worth learning. But if something needs to be maintained then there is an argument for Python/Go/whatever.


> If you're careful with variable scoping and error handling, bash is perfectly functional for many uses.

“Loaded guns are perfectly functional for juggling, just be careful with the trigger and you won’t shoot yourself in the foot!”

You are technically correct but why bother with being careful when you could just avoid writing bash?


Because it's really fast to iterate on if you know it, it's available basically everywhere and has no external dependencies, and you don't have to compile it.


Mawk can be even faster, although missing some features of GNU Awk 5.


It is always "horses for courses" and there may be times when the five concurrent cores with the shell pipeline will beat the single core awk script.


I don't do a lot of shell scripting type things in Go because it's not a great language for it, but when I do, I take another approach, which is just to panic. Generics offer a nice little

    func Must[T any](x T, err error) T {
        if err != nil {
            panic(err)
        }
        return x
    }
which you can wrap around any standard "x, err :=" function to just make it panic, and even prior to generics you could wrap a "PanicOnErr(justReturnsErr())".

In the event that you want to handle errors in some other manner, you trivially can, and you're not limited to just the pipeline design patterns, which are cool in some ways, but limiting when that's all you have. (It can also be tricky to ensure the pipeline is written in a way that doesn't generate a ton of memory traffic with intermediate arrays; I haven't checked to see what the library they show does.) Presumably if I'm writing this in Go I have some other reason for wanting to do that, like having some non-trivial concurrency desire (using concurrency to handle a newline-delimited JSON file was my major use case, doing non-trivial though not terribly extensive work on the JSON).

While this may make some people freak, IMHO the real point of "errors as values" is not to force you to handle the errors in some very particular manner, but to make you think about the errors more deeply than a conventional exceptions-based program typically does. As such, it is perfectly legal and moral to think about your error handling and decide that what you really want is the entire program to terminate on the first error. Obviously this is not the correct solution for my API server blasting out tens of thousands of highly heterogeneous calls per second, but for a shell script it is quite often the correct answer. As something I have thought about and chosen deliberately, it's fine.


If you're not familiar with Go there is one detail missing from this post (though it's in the script README) - what a complete program looks like. Here's the example from https://github.com/bitfield/script#a-realistic-use-case

    package main

    import (
        "github.com/bitfield/script"
    )

    func main() {
        script.Stdin().Column(1).Freq().First(10).Stdout()
    }


If one were actually going to use something like this, I’d think it’d be worth implementing a little shebang script that can wrap a single-file script in the necessary boilerplate and call go run!


That's a really fun idea. I got that working here: https://til.simonwillison.net/bash/go-script

Now you can run this:

    cat file.txt | ./goscript.sh -c 'script.Stdin().Column(1).Freq().First(10).Stdout()'
Or write scripts like this - call it 'top10.sh':

    #!/tmp/goscript.sh
    script.Stdin().Column(1).Freq().First(10).Stdout()
Then run this:

    chmod 755 topten.sh
    echo "one\none\ntwo" | ./topten.sh


This is fantastic. Thanks for sharing and also including docs on your process and usage!


Hmm, I wonder if this is Microsoft's real endgame with allowing the single line C# syntax.


The whole point of using Go is to explicitly handle errors as they happen. All of these steps can fail, but it’s not clear how they fail and if the next steps should proceed or be skipped on previous failures. This is harder to reason about, debug, and write than grep and bash.


It defaults to not running the rest if a step fails, and the error result is accessible via usual mechanisms.

  _, err := script.Foo(...).Bar(...).Stdout();
  if err != nil {
    log.Fatal(err)
  }
is sufficient for a quick scripting hack designed to be run interactively.

I don't see it as a lot different to bash scripts with -e and pipefail set, which is generally preferable anyway.

Plenty of go code does

  if err != nil {
    return nil, err;
  }
for each step and there are plenty of cases where you only care -if- it failed plus a description of some sort of the failure - if you want to proceed on some errors you'd split the pipe up so that it pauses at moments where you can check that and compensate accordingly.

(and under -e plus pipefail, "error reported to stdout followed by aborting" is pretty much what you get in bash as well, so I'm unconvinced it's actually going to be harder to debug)


>The whole point of using Go is to explicitly handle errors as they happen

That's hardly the whole point of using Go.

The friendlier syntax (and in this case DSL) is an ever bigger point.

In any case, you can trivially get at the error at the point it occured:

n, err := script.File("test.txt").Match("Error").CountLines()


I believe error handling looks like this:

    package main

    import (
        "github.com/bitfield/script"
    )

    func main() {
         _, err := script.Stdin().Column(1).Freq().First(10).Stdout()
        if err != nil {
            log.Fatal(err)
        }
    }
Errors are "remembered" by the pipeline and can be processed when you get to a sink method.


From a technical point of view nothing prevents the scripting package to be just as informative with errors as bash and have a helper to log and clear the error. If it is not already the case, I call it a bug.


Inspired by comments in this thread, I threw together a Bash script that lets you do this:

    cat file.txt | ./goscript.sh -c 'script.Stdin().Column(1).Freq().First(10).Stdout()'
You can also use it as a shebang line to write self-contained scripts.

Details here: https://til.simonwillison.net/bash/go-script


I like Go, but its insistence on not permitting unused imports and unused variables make it unsuitable for scripting, imo.

For scripting I want something that I can be fast and messy in. Go is the opposite of that.

It's ok, a language doesn't have to be good at everything.


There should totally be a compiler flag to not require those


Maybe there could be a whole class of things that are like errors, but not as severe, and flags that deal with them as a group! We could call them "warnings"


Oh you mean things that are ignored in every situation ever


If you choose to do that with warnings, that's your prerogative. I pretty much take the opposite approach personally; before putting code into review (or merging, if it's a personal project with no reviewers), every warning is inspected and I decide whether to fix it by changing the code not to generate it or manually suppress it only in that specific location. 99% of the time the warning is a sign of something actually wrong, but the 1% where I know that I actually don't care (and the extremely frequent occurrences when I don't while in the development part of the cycle and not finished with the implementation), it's much better to not have it completely block me.


I agree. When I am scripting, I want to have quick feed back loop. You can't really do that with Go because it doesn't have as good introspection and debugging capabilities as a scripting language like Ruby and doesn't have exceptions, which means that error handling is more verbose than necessary.

Also, I like being able to make modifications on the fly, so doing something in Ruby, I can just open the file make adjustments and I am done. With Go, I have to compile it and move it back into my path which is really tedious.


From Sanjay Ghemawat, 9 years ago

https://github.com/ghemawat/stream


Shell scripting is quite fine up until certain complexity (say 500-1000 lines), after which adding even a single small feature becomes a huge drag. We're talking hours for something that would take me 10 minutes in Golang and 15 in Rust.

Many people love to smirk and say "just learn bash properly, duh" but that's missing the point that we never do big projects in bash so our muscle memory of bash is always kind of shallow. And by "we" I mean "a lot of programmers"; I am not stupid, but I have to learn bash's intricacies every time almost from scratch and that's not productive. It's very normal for things to slip up from your memory when you're not using them regularly. To make this even more annoying, nobody will pay me to work exclusively with bash for 3 months until it gets etched deep into my memory. So there's that too.

I view OP as a good reminder that maybe universal-ish tools to get most of what we need from shell scripting exist even today but we aren't giving them enough attention and energy and we don't make them mainstream. Though it doesn't help that Golang doesn't automatically fetch dependencies when you just do `go run random_script.go`: https://github.com/golang/go/issues/36513

I am not fixating on Golang in particular. But IMO next_bash_or_something should be due Soon™. It's not a huge problem to install a single program when provisioning a new VM or container either so I am not sure why are people so averse to it.

So yeah, nice article. I like the direction.

EDIT: I know about nushell, oilshell and fish but admittedly never gave them a chance.


This is satire, right? I think commenters are completely missing the point.

https://en.m.wikipedia.org/wiki/A_Modest_Proposal


The submitted title was "Scripting with Go: A Modest Proposal" but the phrase "modest proposal" doesn't appear in the article, so I've taken it out.

"Please use the original title, unless it is misleading or linkbait; don't editorialize." - https://news.ycombinator.com/newsguidelines.html


The unix philosophy of having small programs that take in input, process it and return a result has proven to a success, I just never understood why the next logical step of having this program in library form never became a thing. I guess shells are a bit useful but not as useful as a decent repl (common-lisp or the jupyter repl) where these programs can be used as if they were a function.


I ended up using this for my cli scripting needs. https://github.com/google/zx


Why is the Google project advertising Webpod?


Previous discussion (March 11, 2022 | 243 points | 66 comments):

https://news.ycombinator.com/item?id=30641883


Would love to use more golang- amazing build system and cross compiler built in. "All in one" binaries are the best thing ever. I adore most of the ideas in the language.

.... but there are just soooo many little annoyances / inconveniences which turn me off.

- No Optional Parameters. No Named Parameters. Throw us a bone Rob Pike, it's 2023. Type inferred composite literals may be an OK compromise.. if we ever see them: https://github.com/golang/go/issues/12854

- Unused import = will not compile. Unused variable = Will not compile. Give us the ability to turn off the warning.

- No null safe or nullish coalescing operator. (? in rust, ?? in php, etc.)

- Verbosity of if err != nil { return err; }

- A ternary operator would be nice, and could bring if err != nil to 1 line.

- No double declarations. “no new variables on left side of :=” .. For some odd reason “err” is OK here... Would be highly convenient for pipelines, so each result doesn't need to be uniquely named.

I'd describe Go as a "simple" language- Not an "easy" language. 1-2 lines in Python is going to be 5-10 lines in golang.

Note: Nim has most of these..


Agreed!

Shameless plug: this is why I built Risor.

https://github.com/risor-io/risor

Keep in the Go ecosystem, retain compatibility with the Go programs you already have, but have a much more concise scripting capability at your disposal.


Looks more useful than OP.


The error handling verbosity in Go should be blamed partially on the formatter that replaces one-liner if err != nil { return err } with 3 lines.


I agree. Go has such amazing infrastructure it's a huge shame the language is so stubbornly basic.


Have you tried Deno?


URL?


https://deno.land/

This is Typescript, but you have language complaints, and it will build binaries.


Perl was literally made for that, just use it


Or Ruby, but it requies Ruby to be installed so I guess Crystal is a better alternative?


I totally agree with you on that.


      export LC_ALL=C
      awk '!a[$1]++' access.log|head 
If access.log is large enough, awk will fail.

When this happens, one can split access.log into pieces, process separately then recombine.

But that's more or less what sort(1) does with large files, creating temporary files in $TMPDIR or other user-specified directory after -T if using GNU sort.

There was a way to eliminate duplicate lines from an unordered list using k/q, without using temporary files but I stopped using it after Kx, Inc. was sold off and I started using musl exclusively. q requires glibc.

For example, something like

     #!/bin/sh
     # usage: $0 file
     echo "k).Q.fs[l:0::\`:$1];l:?:l;\`:$1 0:l"|exec q >null;
Can this be done in ngn k.

The other approach I use to avoid temporary files is to just put the list in an SQL database, add a UNIQUE constraint, and update the database.


Out of curiosity, what's your reasoning for using musl exclusively


I put together a go "sh-bang" line so you can just chmod +x your .go file and run it (and it works with go fmt unlike other options).

    /*usr/bin/env go run "$0" "$@"; exit $? #*/
It's fun try it out! Just make this the first line of the file.


I have been thinking that JS template literals could be a great replacement for shell programming, allowing you to make more powerful syntax to emulate a lot of bash useful things while still having a lot of a proper programming language power

for example:

  import { jsh, cat, grep, PipeOutput } from 'jsh; 
  // type PipeOutput = { stdout: ReadableStream, toString: () => Promise<string>, extra: Record<string,any> }

  function countLines(input: PipeOutput, argv: string[]): PipeOutput {
    // ...
  }
  const textToLookFor = process.argv[1]
  const output: PipeOutput = jsh`${cat} file.txt | ${grep} ${textToLookFor} | 
  ${countLines}`
  console.log(output.toString())


The pipe like code with dot notation reminds me a lot of jQuery. That’s a compliment.


Take a look at Risor and its pipes capability.

https://github.com/risor-io/risor#quick-example

Stay in the Go ecosystem, but gain pipes, Python-like f-strings, and more.

(I'm the author)


I agree with this, and also in a complimentary way, but it all seems very non-idiomatic for Go. But I am not a Go expert by any means.


Try jq if you haven't already.


Yes, but jq’s syntax is impossible to memorize for me.

gron | rg

FTW


Oh very neat, thanks for posting I will definitely give this a try.


Tangentially related: I posted a shebang for scripting in rust some years ago, if anyone is interested: https://neosmart.net/blog/self-compiling-rust-code/


This post is several years old fwiw.


There's a cute little icon telling us "Feb 21" right at the top but omitting the year, which would have been ever so helpful.


It's wrapped with a nice semantic tag with the actual date:

  <time class="dt-published date-callout" datetime="2022-02-21" pubdate="">
    <div class="date-wrapper">
      <span class="month">Feb </span><span class="day">21</span>
    </div>
  </time>
Not sure if that's correct or not, though.


Interesting. I do something similar with my task https://github.com/kardianos/task package, which is in tern loosely based off of another package from 10-15 years ago.


That sounds interesting, but the package is unfortunately undocumented. I tried https://pkg.go.dev/github.com/kardianos/task, but that doesn't help me understand it either. It's missing a high level explanation of what to use it for, its limits and some decent examples.



> cut -d' ' -f 1 access.log |sort |uniq -c |sort -rn |head

FYI Go has Compact now:

https://godocs.io/slices#Compact


Discussed at the time:

Scripting with Go - https://news.ycombinator.com/item?id=30641883 - March 2022 (66 comments)


terrible idea




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: