• neclimdul@lemmy.world
    link
    fedilink
    English
    arrow-up
    34
    arrow-down
    6
    ·
    2 days ago

    I kind of agree it’s a multiplier. But so far every time I’ve had it do something its written such an ugly turd I have to rewire it all taking more time than if I’d just solved the problem to start with. Maybe someday but it’s not up to the quality I expect of development.

    • scarabic@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      13 hours ago

      I got a lot of garbage when I didn’t know what I was doing and just tried AI once or twice a week with lazy prompts, expecting perfection without iterations. I’d huff and crow about how I had to fix things, whereas now I just tell it what to fix, or even better how to get it right the first time. I’ve built up my library of skills and prompts and refined them quite a bit. The models keep getting smarter. You should really look at your tools and methods - sounds like you’re stuck in 2024.

      • MangoCats@feddit.it
        link
        fedilink
        English
        arrow-up
        1
        ·
        11 hours ago

        I’ve been using it rather heavily since about October of last year, I definitely do notice the models getting better, the tools around the models starting to do some things automatically that I had to manually prompt for last year (especially remembering key instructions). I also believe I am getting better at using them, how much that contributes to my overall results is extremely hard to quantify, but the feeling is definitely there. Like - last October I used to “just ask” for things without having a documented set of requirements. Today, I just know that the requirements document is necessary when the level of complexity is above… well, above a one-off simple example of how to do something relatively trivial.

    • ChickenLadyLovesLife@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      ·
      2 days ago

      I kind of agree it’s a multiplier.

      It’s definitely a force multiplier, it’s just that the factor after the X can be less than 1.0.

    • FaceDeer@fedia.io
      link
      fedilink
      arrow-up
      18
      arrow-down
      6
      ·
      2 days ago

      Have you tried giving it coding standards and other such preferences about how you like your code to be organized? I’ve found that coding agents can be quite adaptable to various styles, you can put stuff like “try to keep functions less than 100 lines long” or “include assertions validating all function inputs” into your coding agent’s general instructions and it’ll follow them.

      For me, one of the things that’s a huge fundamental improvement is telling the agent to create and run unit tests for everything. That way when it does mess up accidentally it can immediately catch the problem and usually fixes it in the same session without further intervention. Unit tests used to be more trouble than they were worth most of the time, now I love them.

      • MangoCats@feddit.it
        link
        fedilink
        English
        arrow-up
        2
        ·
        11 hours ago

        After I worked with AI agents a little, I dove in with a big set of coding standards and practices and… I overdid it. I find I get better results by starting off with a “light touch” and letting it do what it wants, then correcting where it gets off track (like using python for something that needs efficient performance…)

        • FaceDeer@fedia.io
          link
          fedilink
          arrow-up
          11
          arrow-down
          7
          ·
          2 days ago

          No, I’ve used them plenty before. I just found them to generally be a huge hassle of minimal benefit. They became much more useful in the context of agentic coding, where you want the agent to be able to immediately realize “oh, this change I made causes these specific problems when it’s run.” The hassle is all on the agent, not on me.

          • MangoCats@feddit.it
            link
            fedilink
            English
            arrow-up
            2
            ·
            11 hours ago

            The hassle is all on the agent, not on me.

            So much this. That hassle on the agent, a few minutes of me waiting for it to crunch out the unit tests, saves me tons of hassle later - not going in circles re-fixing problems that were fixed before.

            Same for keeping implementation code and documentation in sync - I’ve got hundreds of out-of-date wiki pages that simply aren’t worth my time to fix. But when it’s the agent keeping the docs in sync, just tell it to do it and wait a few minutes - totally worth the effort.

            • FaceDeer@fedia.io
              link
              fedilink
              arrow-up
              9
              arrow-down
              6
              ·
              2 days ago

              Could be. I’m a professional programmer whose usage runs the whole gamut - large applications with hundreds of programmers working on them for years, smaller apps that I make for my own use, and one-off scripts to do some particular task and then generally throw away afterwards.

              I don’t do unit tests for that last category, of course. I don’t even use coding agents for those, generally speaking - a bit of back-and-forth in a chat interface is usually enough there.

              • neclimdul@lemmy.world
                link
                fedilink
                English
                arrow-up
                10
                arrow-down
                6
                ·
                2 days ago

                Is this like a who’s got a bigger portfolio situation? I’m not sure how to respond

                I guess I’ve been developing for decades including consulting for Page 6, a stint in RD at Sony Music. One of my open source contributions was used as part of the backend for one of Obama’s State of the Unions. I spend my time these days writing and maintaining multiple software stacks integrating across multiple platforms.

                • FaceDeer@fedia.io
                  link
                  fedilink
                  arrow-up
                  12
                  arrow-down
                  3
                  ·
                  2 days ago

                  Since you brought up the notion that we might be doing different styles of development, I was giving you context as to the kinds of development that I do. Sounds like we might not be doing such different scales of development after all, but I couldn’t have known that until you gave that information just now.

                  This isn’t supposed to be some kind of duel or argument, I don’t see the point of that. I’m just explaining my usage of coding agents and specifically unit tests in that context. Since that’s what you were questioning.

                  • neclimdul@lemmy.world
                    link
                    fedilink
                    English
                    arrow-up
                    7
                    arrow-down
                    5
                    ·
                    2 days ago

                    I see it seemed more like a weird flex.

                    Anyways, I couldnt possibly deploy with any confidence a large project or honestly a small project I expected someone to rely on without layers of test. Unintended consequences of even a small change are just a reality. And with the expectation to move quick with large legacy systems, if you don’t have tests that’s a dangerous high wire act.

                • aesthelete@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  arrow-down
                  4
                  ·
                  2 days ago

                  Wow what a circlejerk this turned into.

                  Oh well, I guess that’s what everything really is the whole time.

      • dreamkeeper@literature.cafe
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 days ago

        We have ours configured with our coding standards, mcps, and we have a skill library.

        It still outputs code full of mistakes. Usually they’re minor mistakes, but not always.

        When we use it to fix defects, it usually fixes the problem, but not in a very robust way. It still needs a lot of supervision to output quality code. For example it will often spot fix defects instead of applying the principle of the code fix to other areas that also need it (i.e. we needed to normalize some data but it only did it in one place, because the ticket only mentioned that one place, however that data is used elsewhere as well)

        It’s a helpful tool for sure but it’s rare that I don’t need to make corrections

      • Badabinski@kbin.earth
        link
        fedilink
        arrow-up
        1
        ·
        2 days ago

        I’ll say that during a recent week where I was forced to use an LLM, I found Claude Opus to be extremely poor at referencing this guide: https://mywiki.wooledge.org/BashPitfalls

        it took almost an hour to get Claude to write me a shell script which I considered to be of acceptable quality. It completely hallucinated about several of the points in that guide, requiring me to just go read the guide myself to verify that the language model was falsifying information. That same task would have taken me about 5 minutes.

        I believe that GIGO applies here. 99% of shell scripts on the internet are unsafe and terrible (looking at you, set -euo pipefail), and Claude is much more likely to generate god awful garbage because of the inherent bias present in the training data.

        And as for unit tests? Imo, anything other than property-based testing is irrelevant. If you’re using something like Pydantic, you can auto-generate a LOT of your tests using the rich type annotations available in that library along with hypothesis. I tend to write a testing framework once, and then special case property tests for things that fall outside of my models. None of this is super helpful for big ugly codebases with a lot of inertia around practices, but that’s not been my environment, thankfully.

        • MangoCats@feddit.it
          link
          fedilink
          English
          arrow-up
          2
          ·
          10 hours ago

          WTF are you expecting Claude to code in bash?

          I have found Sonnet and Opus to both be very capable in bash, but then, I don’t usually ask bash to do super-complex things - its syntax is just too screwy to think about big applications in it.

          I will say, you might be misguiding the LLM by filling it full of bad examples before starting. Kind of like the advice about not staring at a tree downslope while skiing, if you’re fixated on it you’re MORE likely to hit it.

          • Badabinski@kbin.earth
            link
            fedilink
            arrow-up
            1
            ·
            7 hours ago

            I gave it no advice, and all I wanted it to do was generate a script to tell me the file type of the newest file in the current directory. It was a very trivial piece of code. Each time it generated something I disliked, I told it “don’t do this, reference this guide for the correct thing to do,” or “don’t do that, do it in such a way that X happens.” It was like 20 lines of bash in the end.

            I was expecting it to write me a bash script because that’s the example that everyone, without fail, says will work well. “I just used Claude to write a little throwaway script to move some files around” were the exact words a colleague used.

            Bash is a shitty, unsafe language. I don’t write large programs in it. I expect “throwaway scripts” to still be written in a way that defends against all of the innumerable shitass foot guns present in the language. Claude was incapable of doing this in a reasonable time frame.

            I also dislike the Python and Go it generated, while we’re at it. It produces overly verbose, overly documented, poorly performing code. It was also fucking dog shit at referencing runbooks and documentation in a local folder when I was on call and responding to alerts.

            It sounds like you’re quite partial to Claude, and I hope it’s been a very good and helpful tool for you. I did not find it to be particularly helpful for me. It was very good at putting me in a sour mood, however.

          • Badabinski@kbin.earth
            link
            fedilink
            arrow-up
            2
            ·
            2 days ago

            Shellcheck, while good, doesn’t capture all best practices in my opinion. There are many items in that doc which shellcheck would happily allow, worst of all being set -euo pipefail.

      • ExLisper@lemmy.curiana.net
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        5
        ·
        2 days ago

        Unit tests used to be more trouble than they were worth most of the time, now I love them.

        Sounds like you were writing bad unit tests and AI showed you how to do it right.