I’ll say that during a recent week where I was forced to use an LLM, I found Claude Opus to be extremely poor at referencing this guide: https://mywiki.wooledge.org/BashPitfalls
it took almost an hour to get Claude to write me a shell script which I considered to be of acceptable quality. It completely hallucinated about several of the points in that guide, requiring me to just go read the guide myself to verify that the language model was falsifying information. That same task would have taken me about 5 minutes.
I believe that GIGO applies here. 99% of shell scripts on the internet are unsafe and terrible (looking at you, set -euo pipefail), and Claude is much more likely to generate god awful garbage because of the inherent bias present in the training data.
And as for unit tests? Imo, anything other than property-based testing is irrelevant. If you’re using something like Pydantic, you can auto-generate a LOT of your tests using the rich type annotations available in that library along with hypothesis. I tend to write a testing framework once, and then special case property tests for things that fall outside of my models. None of this is super helpful for big ugly codebases with a lot of inertia around practices, but that’s not been my environment, thankfully.
I have found Sonnet and Opus to both be very capable in bash, but then, I don’t usually ask bash to do super-complex things - its syntax is just too screwy to think about big applications in it.
I will say, you might be misguiding the LLM by filling it full of bad examples before starting. Kind of like the advice about not staring at a tree downslope while skiing, if you’re fixated on it you’re MORE likely to hit it.
I gave it no advice, and all I wanted it to do was generate a script to tell me the file type of the newest file in the current directory. It was a very trivial piece of code. Each time it generated something I disliked, I told it “don’t do this, reference this guide for the correct thing to do,” or “don’t do that, do it in such a way that X happens.” It was like 20 lines of bash in the end.
I was expecting it to write me a bash script because that’s the example that everyone, without fail, says will work well. “I just used Claude to write a little throwaway script to move some files around” were the exact words a colleague used.
Bash is a shitty, unsafe language. I don’t write large programs in it. I expect “throwaway scripts” to still be written in a way that defends against all of the innumerable shitass foot guns present in the language. Claude was incapable of doing this in a reasonable time frame.
I also dislike the Python and Go it generated, while we’re at it. It produces overly verbose, overly documented, poorly performing code. It was also fucking dog shit at referencing runbooks and documentation in a local folder when I was on call and responding to alerts.
It sounds like you’re quite partial to Claude, and I hope it’s been a very good and helpful tool for you. I did not find it to be particularly helpful for me. It was very good at putting me in a sour mood, however.
I expect “throwaway scripts” to still be written in a way that defends against all of the innumerable shitass foot guns present in the language. Claude was incapable of doing this in a reasonable time frame.
There’s the problem with your expectations. You may be able to follow your little guide to bash problems and “best practices” but defending against the innumerable shitass footguns present in bash is not a task that can be accomplished by anybody in a reasonable timeframe…
I wasn’t so thrilled with Claude in the October 2025 timeframe - Opus was slow and costly and wrote un-necessarily weird solutions for simple problems, Sonnet would still get caught in bug-fix creates new bug loops. It (and the other models like Gemini, GPT, etc.) has improved, significantly, since then. Back then it wasn’t hard to “make the tool look bad.” It’s still not too hard to make the tools look bad today if you try, but it is much easier for me to make them look good.
I, too, would be more sour mood if I hated the tools and still had to be demonstrating to management “how we’re going to leverage AI for software development” - which is on our goals this year.
Shellcheck, while good, doesn’t capture all best practices in my opinion. There are many items in that doc which shellcheck would happily allow, worst of all being set -euo pipefail.
I’ll say that during a recent week where I was forced to use an LLM, I found Claude Opus to be extremely poor at referencing this guide: https://mywiki.wooledge.org/BashPitfalls
it took almost an hour to get Claude to write me a shell script which I considered to be of acceptable quality. It completely hallucinated about several of the points in that guide, requiring me to just go read the guide myself to verify that the language model was falsifying information. That same task would have taken me about 5 minutes.
I believe that GIGO applies here. 99% of shell scripts on the internet are unsafe and terrible (looking at you,
set -euo pipefail), and Claude is much more likely to generate god awful garbage because of the inherent bias present in the training data.And as for unit tests? Imo, anything other than property-based testing is irrelevant. If you’re using something like Pydantic, you can auto-generate a LOT of your tests using the rich type annotations available in that library along with hypothesis. I tend to write a testing framework once, and then special case property tests for things that fall outside of my models. None of this is super helpful for big ugly codebases with a lot of inertia around practices, but that’s not been my environment, thankfully.
WTF are you expecting Claude to code in bash?
I have found Sonnet and Opus to both be very capable in bash, but then, I don’t usually ask bash to do super-complex things - its syntax is just too screwy to think about big applications in it.
I will say, you might be misguiding the LLM by filling it full of bad examples before starting. Kind of like the advice about not staring at a tree downslope while skiing, if you’re fixated on it you’re MORE likely to hit it.
I gave it no advice, and all I wanted it to do was generate a script to tell me the file type of the newest file in the current directory. It was a very trivial piece of code. Each time it generated something I disliked, I told it “don’t do this, reference this guide for the correct thing to do,” or “don’t do that, do it in such a way that X happens.” It was like 20 lines of bash in the end.
I was expecting it to write me a bash script because that’s the example that everyone, without fail, says will work well. “I just used Claude to write a little throwaway script to move some files around” were the exact words a colleague used.
Bash is a shitty, unsafe language. I don’t write large programs in it. I expect “throwaway scripts” to still be written in a way that defends against all of the innumerable shitass foot guns present in the language. Claude was incapable of doing this in a reasonable time frame.
I also dislike the Python and Go it generated, while we’re at it. It produces overly verbose, overly documented, poorly performing code. It was also fucking dog shit at referencing runbooks and documentation in a local folder when I was on call and responding to alerts.
It sounds like you’re quite partial to Claude, and I hope it’s been a very good and helpful tool for you. I did not find it to be particularly helpful for me. It was very good at putting me in a sour mood, however.
There’s the problem with your expectations. You may be able to follow your little guide to bash problems and “best practices” but defending against the innumerable shitass footguns present in bash is not a task that can be accomplished by anybody in a reasonable timeframe…
I wasn’t so thrilled with Claude in the October 2025 timeframe - Opus was slow and costly and wrote un-necessarily weird solutions for simple problems, Sonnet would still get caught in bug-fix creates new bug loops. It (and the other models like Gemini, GPT, etc.) has improved, significantly, since then. Back then it wasn’t hard to “make the tool look bad.” It’s still not too hard to make the tools look bad today if you try, but it is much easier for me to make them look good.
I, too, would be more sour mood if I hated the tools and still had to be demonstrating to management “how we’re going to leverage AI for software development” - which is on our goals this year.
Why not just give it shellcheck and have it run that on every script it creates?
Shellcheck, while good, doesn’t capture all best practices in my opinion. There are many items in that doc which shellcheck would happily allow, worst of all being
set -euo pipefail.