How I write software with LLMs

(stavros.io)

74 points | by indigodaddy 4 hours ago

5 comments

  • jumploops 9 minutes ago
    This is similar to how I use LLMs (architect/plan -> implement -> debug/review), but after getting bit a few times, I have a few extra things in my process:

    The main difference between my workflow and the authors, is that I have the LLM "write" the design/plan/open questions/debug/etc. into markdown files, for almost every step.

    This is mostly helpful because it "anchors" decisions into timestamped files, rather than just loose back-and-forth specs in the context window.

    Before the current round of models, I would religiously clear context and rely on these files for truth, but even with the newest models/agentic harnesses, I find it helps avoid regressions as the software evolves over time.

    A minor difference between myself and the author, is that I don't rely on specific sub-agents (beyond what the agentic harness has built-in for e.g. file exploration).

    I say it's minor, because in practice the actual calls to the LLMs undoubtedly look quite similar (clean context window, different task/model, etc.).

    One tip, if you have access, is to do the initial design/architecture with GPT-5.x Pro, and then take the output "spec" from that chat/iteration to kick-off a codex/claude code session. This can also be helpful for hard to reason about bugs, but I've only done that a handful of times at this point (i.e. funky dynamic SVG-based animation snafu).

  • silisili 30 minutes ago
    I'm not sure the notion I keep seeing of "it's ok, we still architect, it just writes the code"(paraphrased) sits well with me.

    I've not tested it with architecting a full system, but assuming it isn't good at it today... it's only a matter of time. Then what is our use?

    • AstroBen 15 minutes ago
      Teaching it good judgement is orders of magnitude harder than teaching it to pass tests.

      Post-training a model heavily relies on verifiable feedback. There's a tonne of our work that doesn't have that.

      An LLM can write tests that pass, yeah.. how do you know they're high value tests, testing the right behavior?

      How do you know if a prompt is under-specified?

      How do you give it feedback to learn to create good abstractions?

      RLHF has been used for 4? years here now and it still can't stop Opus from using dumb names for variables/functions and duplicating code all over the place. For predictable reasons: it's not an easily verifiable thing to train against.

      Even assuming it could do perfect implementation then someone still needs to set up the harnesses surrounding it. You can't write tests without the architecture in mind. Well designed code is significantly easier to test.

      It's not a human slowly getting smarter over time. We can somewhat predict the direction of improvement.

    • chii 15 minutes ago
      > Then what is our use?

      You will have to find new economic utility. That's the reality of technological progress - it's just that the tech and white collar industries didn't think it can come for them!

      A skill that becomes obsoleted is useless, obviously. There's still room for artisanal/handcrafted wares today, amidst the industrial scale productions, so i would assume similar levels for coding.

    • borski 23 minutes ago
      LLMs can build anything. The real question is what is worth building, and how it’s delivered. That is what is still human. LLMs, by nature of not being human, cannot understand humans as well as other humans can. (See every attempt at using an LLM as a therapist)

      In short: LLMs will eventually be able to architect software. But it’s still just a tool

      • silisili 20 minutes ago
        What is the use of software eng/architect at that point? It's a tool, but one that product or C levels can use directly as I see it?
        • 0xbadcafebee 11 minutes ago
          A software engineer will be a person who inspects the AI's work, same as a building inspector today. A software architect will co-sign on someone's printed-up AI plans, same as a building architect today. Some will be in-house, some will do contract work, and some will be artists trying to create something special, same as today. The brute labor is automated away, and the creativity (and liability) is captured by humans.
        • borski 18 minutes ago
          Yes, for building something

          But for building the right thing? Doubtful.

          Most of a great engineer’s work isn’t writing code, but interrogating what people think their problems are, to find what the actual problems are.

          In short: problem solving, not writing code.

  • christofosho 3 hours ago
    I like reading these types of breakdowns. Really gives you ideas and insight into how others are approaching development with agents. I'm surprised the author hasn't broken down the developer agent persona into smaller subagents. There is a lot of context used when your agent needs to write in a larger breadth of code areas (i.e. database queries, tests, business logic, infrastructure, the general code skeleton). I've also read[1] that having a researcher and then a planner helps with context management in the pre-dev stage as well. I like his use of multiple reviewers, and am similarly surprised that they aren't refined into specialized roles.

    I'll admit to being a "one prompt to rule them all" developer, and will not let a chat go longer than the first input I give. If mistakes are made, I fix the system prompt or the input prompt and try again. And I make sure the work is broken down as much as possible. That means taking the time to do some discovery before I hit send.

    Is anyone else using many smaller specific agents? What types of patterns are you employing? TIA

    1. https://github.com/humanlayer/advanced-context-engineering-f...

    • marcus_holmes 2 hours ago
      that reference you give is pretty dated now, based on a talk from August which is the Beforetimes of the newer models that have given such a step change in productivity.

      The key change I've found is really around orchestration - as TFA says, you don't run the prompt yourself. The orchestrator runs the whole thing. It gets you to talk to the architect/planner, then the output of that plan is sent to another agent, automatically. In his case he's using an architect, a developer, and some reviewers. I've been using a Superpowers-based [0] orchestration system, which runs a brainstorm, then a design plan, then an implementation plan, then some devs, then some reviewers, and loops back to the implementation plan to check progress and correctness.

      It's actually fun. I've been coding for 40+ years now, and I'm enjoying this :)

      [0] https://github.com/obra/superpowers

      • indigodaddy 2 hours ago
        Can you bolt superpowers onto an existing project so that it uses the approach going forward (I'm using Opencode), or would that get too messy?
    • felixsells 23 minutes ago
      re: breaking into specialized subagents -- yes, it matters significantly but the splitting criteria isn't obvious at first.

      what we found: split on domain of side effects, not on task complexity. a "researcher" agent that only reads and a "writer" agent that only publishes can share context freely because only one of them has irreversible actions. mixing read + write in one agent makes restart-safety much harder to reason about.

      the other practical thing: separate agents with separate context windows helps a lot when you have parts of the graph that are genuinely parallel. a single large agent serializes work it could parallelize, and the latency compounds across the whole pipeline.

  • plastic041 31 minutes ago
    I wanted to know how to make softwares with LLM "without losing the benefit of knowing how the entire system works" and "intimately familiar with each project’s architecture and inner workings", while "have never even read most of their code". (Because obviously, you can't.) But OP didn't explain that.

    You tell LLM to create something, and then use another LLM to review it. It might make the result safer, but it doesn't mean that YOU understand the architecture. No one does.

    • ashwinsundar 28 minutes ago
      Hot take: you can't have your cake and eat it too. If you aren't writing code, designing the system, creating architecture, or even writing the prompt, then you're not understanding shit. You're playing slots with stochastic parrots

          The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
      
      - Karpathy 2025
      • simonw 21 minutes ago
        Your Karpathy quote there is out of context. It starts with: https://twitter.com/karpathy/status/1886192184808149383

          There's a new kind of coding I call "vibe
          coding", where you fully give in to the
          vibes, embrace exponentials, and forget
          that the code even exists.
        
        Not all AI-assisted programming is vibe coding. If you're paying attention to the code that's being produced you can guide it towards being just as high quality (or even higher quality) than code you would have written by hand.
        • ashwinsundar 5 minutes ago
          It's appropriate for the commenter I was replying to, who asked how they can understand things, "while having never even read most of their code."

          I like AI-assisted programming, but if I fail to even read the code produced, then I might as well treat it like a no-code system. I can understand the high-levels of how no-code works, but as soon as it breaks, it might as well be a black box. And this only gets worse as the codebase spans into the tens of thousands of lines without me having read any of it.

          The (imperfect) analogy I'm working on is a baker who bakes cakes. A nearby grocery store starts making any cake they want, on demand, so the baker decides to quit baking cakes and buy them from the store. The baker calls the store anytime they want a new cake, and just tells them exactly what they want. How long can that baker call themself a "baker"? How long before they forget how to even bake a cake, and all they can do is get cakes from the grocer?

  • indigodaddy 3 hours ago
    This was on the front page and then got completely buried for some reason. Super weird.
    • mjmas 2 hours ago
      On the front page at the moment. Position 12
      • indigodaddy 2 hours ago
        Maybe I missed it. Sometimes when you're scanning for something your brain intentionally doesn't want to see it, I've noticed. Anyway I'm not Stavros obviously, just thought this was a good article.
    • stainlu 2 hours ago
      [flagged]