Prototyping with shell scripts

jericson · August 30, 2021, 6:16am

I posted a thread on Twitter about how I was adding my Board Game Geek and Steam reviews to this blog. I mentioned how much I enjoy using shell scripts for prototyping ideas and Steve Bennett replied:

This is a companion discussion topic for the original entry at https://jlericson.com/2021/08/29/shell_prototyping.html

stevage · August 30, 2021, 8:00am

Well, seems I should post a rebuttal here then?

My bias is that I really, really dislike writing or maintaining shell scripts. To the extent that a while ago I adopted the rule that if a shell script becomes more complex than unconditionally executing a series of commands in order, I have to rewrite it in NodeJS, and I’ve been happier since.

Anyway, my specific counterpoints:

There is far less consistency between programs/commands than between libraries in a language. Different commands use command line arguments completely differently (double dash? one dash? no dashes?). So, as your post indicates, the knowledge you build up tends to be an esoteric memorisation of the oddities of each individual tool.
Related, perhaps: the behaviour of most common *nix tools was set in stone decades ago. This is not a good thing. NPM libraries regularly iterate on their interfaces, coming up with much better, more ergonomic ways of interacting with them, often in convergent ways. You still want that v1.0 behaviour? Sure, you can still have it. But if you’re new to the tool, you’ll find v6.0 much easier and more intuitive.
Using an actual programming language means you don’t rely on weird tricks for each tool. That [1-5] trick was cute, but it only works for curl. Whereas in JavaScript I can easily iterate from 1 to 5, and use a template string to drop in the appropriate value. And that will work for curl, wget, rsync, whatever.
It’s a nitpick, but, no, the Bash REPL isn’t “the world’s tightest read-eval-print loop since each command is evaluated instantly and automatically after the user presses Return”. Try the REPL in your browser these days - it evaluates the expression you’re typing before you press Return.

But really, the biggest problem with writing a shell script is the risk that it might become important, and need things like proper error handling, asynchronous behaviour, configuration management etc. In which case one of three things will happen:

You’ll push on with Bash and end up with an unreadable, unmaintainable nightmare.
You’ll sigh and rewrite the thing in NodeJS or Python and wish you’d done it sooner.
You’ll do neither, continuing to have a script which doesn’t do the things it should, and is also unreadable and unmaintainable.

jericson · August 30, 2021, 6:41pm

I strongly disagree. Consider tar which has a notoriously terrible interface. It would be great to fix, but doing so would break every script and documentation page that ever mentioned tar. The upside is that scripts I wrote three jobs ago still work without modification. If someone writes a better tar (faster, more compression options, etc), I can get that functionality by changing a symlink. (Though that does present other problems, of course.)

I can see how this is attractive if you are actively developing software using those libraries. But it’s a disaster for people just wanting to use software that depends on them. We’re using Gatsby for EDB Docs. We’re a few releases behind and yes, we could stay there. But if we want any new functionality we’re going to need to update not just Gatsby, but dozens of plugins and libraries that have changed in the meantime. I’ve planned half a day to do the update and I’m worried it won’t be enough time.

Or take this blog. It uses Jekyll which is written in Ruby. Unlike Node, I’m reasonably fluent in Ruby. So I’ve been updating to the latest versions trusting that I’ll be able to sort out the problems (and learn something along the way). Unfortunately, that’s not a good plan. Right now there’s something wrong with my configuration and the --watch option is broken for me. So if I change content on my blog, I need to kill the server and restart it to see the rendered changes. (Talk about a slow REPL!) Most likely someone failed to update a dependency somewhere and I have incompatible libraries.

It seems to me that we didn’t learn the mistakes of dll hell. But that’s another rant.

If I’ve done my job prototyping correctly, I don’t really mind rewriting my code in another language. 90% of the work is figuring out where the data I need is hidden, scripting the business logic, deciding on what the output should look like and so on. Rewriting gives me an excuse to fix all the problems I only discovered when the work was done.

Take my Ruby script to import Stack Exchange posts. It started as a collection of curl commands to help me figure out how the API works. Once I got the basic logic set, I decided it was useful enough to share with others. So I added some tests, made the command line interface cleaner, did some error checking and all the other things my prototype didn’t need.

And that’s ultimately my problem prototyping in Ruby, Python or whatever. Instead of solving the problem I’m working on, I end up wasting time adding a command line option that I don’t need or building a test that isn’t useful. (It’s like messing with fonts when you should be writing.) With a shell script I’m free to ignore stuff like that. If my script works for most situations, but fails on the one case because, say, someone insisted on putting an exclamation mark in the title of their game I don’t have to stop everything to fix it. I just make a note to fix it later.

Anyway, I see your point about getting stuck with a janky bash script and I expect most people will be able to build a prototype quickly in whatever language they are already most familiar with.

stevage · August 31, 2021, 12:05am

Weirdly, I find the tar interface fine, because you’re always using exactly the same two sets of options. Either -zvcf or -zvxf.

But the real mistake that Unix made is to not distinguish between human interfaces and scripting interfaces. A simple convention could have been that a first argument of say, -s to every command would have been to interpret all remaining arguments as a scripting API, which would remain stable - while the daily command line usage could evolve. (Similar to how I follow the practice of using --long-option-names in scripts, but -l shortcuts on the command line).

Even better, they could have, and should have, versioned these APIs. Then, you could write tar -s1 ... and get that some old decade-old API, while tar -s6 might be more modern and standardised.

If I’ve done my job prototyping correctly, I don’t really mind rewriting my code in another language. 90% of the work is figuring out where the data I need is hidden, scripting the business logic, deciding on what the output should look like and so on.

Probably a difference in skillset. 90% of the time (maybe exaggerating) is googling how to use commands like xargs, if etc, which is completely wasted effort when I switch to a real programming language. The business logic bit is much easier to reason about, and write properly, in a real language - rather than writing a simplified version that is expressible in Bash.

If my script works for most situations, but fails on the one case because, say, someone insisted on putting an exclamation mark in the title of their game I don’t have to stop everything to fix it.

My experience is that when you write code like this in a real programming language, issues like “filename contains a weird character” just never crop up. You don’t need to stop everything to fix it, because it was never broken in the first place.

jericson · August 31, 2021, 6:20am

Generally the Unix solution is to have a newly named command. So gawk replaced awk, gmake replaced make and gtar replaced tar. (So the GNU project was an innovator here.) Of course the newer commands tend to be backward compatible to a degree. A big reason for that is the POSIX standard for shells. The value is not on modernizing the API, but on having a standard across operating systems.

stevage · August 31, 2021, 10:47am

The value is not on modernizing the API, but on having a standard across operating systems.

Indeed. I’d argue that in the context of one person developing scripts that only need to run on their own system, this contributes very little value, and is greatly outweighed by the downside of being stuck in the 70s when you’re developing in the 2020s.