The 70% AI productivity myth: why most companies aren't seeing the gains

(sderosiaux.substack.com)

92 points | by chtefi 39 days ago

29 comments

fancyfredbot 39 days ago
The METR study cited here is very interesting.
"In the METR study, developers predicted AI would make them 24% faster before starting. After finishing 19% slower, they still believed they'd been 20% faster."
I hadn't heard of this study before. Seems like it's been mentioned on HN before but not got much traction.
[-]
- simonw 39 days ago
  I see it brought up almost every week! It's a firm favorite of the "LLMs don't actually help write code" contingent, probably because there are very few other credible studies they can point to in support of their position.
  Most people who cite it clearly didn't read as far as the table where METR themselves say:
  > We do not provide evidence that:
  > 1) AI systems do not currently speed up many or most software developers. Clarification: We do not claim that our developers or repositories represent a majority or plurality of software development work
  > 2) AI systems do not speed up individuals or groups in domains other than software development. Clarification: We only study software development
  > 3) AI systems in the near future will not speed up developers in our exact setting. Clarification: Progress is difficult to predict, and there has been substantial AI progress over the past five years [3]
  > 4) There are not ways of using existing AI systems more effectively to achieve positive speedup in our exact setting. Clarification: Cursor does not sample many tokens from LLMs, it may not use optimal prompting/scaffolding, and domain/repository-specific training/finetuning/few-shot learning could yield positive speedup
  https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
  [-]
  - fancyfredbot 39 days ago
    Weird, you shouldn't really need to list the things your study doesn't prove! I guess they anticipated that the study might be misrepresented and wanted to get ahead of that.
    Their study still shows something interesting, and quite surprising. But if you choose to extrapolate from this specific setting and say coding assistants don't work in general then that's not scientific and you need to be careful.
    I think the studyshould probably decrease your prior that AI assistants actually speed up development, even if developers using AI tell you otherwise. The fact it feels faster when it is slower is super interesting.
    [-]
    - simonw 39 days ago
      The lesson I took from the study is that developers are terrible at estimating their own productivity based on a new tool.
      Being armed with that knowledge is useful when thinking about my own productivity, as I know that there's a risk of me over-estimating the impact of this stuff.
      But then I look at https://github.com/simonw which currently lists 530 commits over 46 repositories for the month of December, which is the month I started using Opus 4.5 in Claude Code. That looks pretty credible to me!
      [-]
      - pydry 39 days ago
        The lesson I learned is that agentic coding uses intermittent reinforcement to mimic a slot machine.
        It (along with the hundreds of billions in investments hinging on it), explains the legions of people online who passionately defend their "system". Every gambler has a "system" and they usually earnestly believe it is helping them.
        Some people even write popular (and profitable!) blogs about playing slots machines where they share their tips and tricks.
        [-]
        logicprog 39 days ago
        I really wish this meme would die.
        We know LLMs instruction follow meaningfully and relatively consistently; we know they are in context learners and also pull from their context window for knowledge; we also know that prompt phrasing and especially organization can have a large effect on their behavior in general; we know from first principles that you can improve the reliability of their results by putting them in a loop with compilers / linters / tests because they do actually fix things when you tell them to. None of this is equivalent to a gambler's superstitions. It may not be perfectly effective, but neither are a million other systems and best practices and paradigms in software.
        Also, it doesn't "use" anything. It may be a feature of the program but it isn't intentionally designed that way.
        Also who sits around rerunning the same prompt over and over again to see if you get a different outcome like its a slot machine? You just directly tell it to fix whatever was bad about the output and it does so. Sometimes initial outputs have a larger or smaller amount of bad, but still. It isn't really analogous to a slot machine.
        Also, you talk as if the whole "do something -> might work / might not, stochastic to a degree, but also meaningfully directable -> dopamine rush if it does; if not goto 1" loop isn't inherent to coding lol
        [-]
        pydry 39 days ago
        I dont think the "meme" that LLMs follow instructions inconsistently will ever die because they do. It's in the nature of how LLMs function under the hood.
        >Also who sits around rerunning the same prompt over and over again to see if you get a different outcome like its a slot machine?
        Nobody. Plenty of people do like to tell the LLM that somebody might die if they dont do X properly and other such faith based interventions with their "magic box" though.
        Boy do their eyes light up when they hit the "jackpot", too (LLM writes what appears to be the correct code on the first shot).
        [-]
        simonw 39 days ago
        They're so much more consistent now than they used to be. The new LLMs almost always boast about how much better they are at "instruction following" and it really shows, I find Claude 4.5 and GPT-5.x models do exactly what I tell them to most of the time.
      - Snuggly73 39 days ago
        I am going to prefix this with that I could be completely wrong.
        Simon - you are an outlier in the sense that basically your job is to play with LLMs. You don't have stakeholders with requirements that they themselves don't understand, you don't have to go to meetings, deal with a team, shout at people, do PRs etc., etc. The whole SDLC/process of SWE is compressed for you.
        [-]
        simonw 39 days ago
        That's mostly (though not 100%) true, and a fair comment to make here.
        Something that's a little relevant to how I work here is that I deliberately use big-team software engineering methods - issue trackers, automated tests, CI, PR code reviews, comprehensive documentation, well-tuned development environments - for all of my personal projects, because I find they help me move faster: https://simonwillison.net/2022/Nov/26/productivity/
        But yes, it's entirely fair to point out that my use of LLMs is quite detached from how they might be used on large team commercial projects.
      - lelanthran 38 days ago
        I think this shows where the real value of AI coding is: brand new repos, on tiny throwaway projects.
        I'm not going to browse every commit in that repo, but half of the projects were created in december. The rest are either a few months old or less than a year.
        This is not representative of the industry.
      - fancyfredbot 39 days ago
        That's certainly an impressive month! However, it's conceivable that you are an outlier (in the best possible way!)
        I liked the way they did that study and I would be interested to see an updated version with new tools.
        I'm not particularly sceptical myself and my guess is that using Opus 4.5 would probably have produced a different result to the one in the original study.
        [-]
        simonw 39 days ago
        I'm definitely an outlier - I've been pushing the boundaries of these tools for three years now and this month I've been deliberately throwing some absurdly ambitious problems at Opus 4.5 (like this one: https://static.simonwillison.net/static/2025/claude-code-mic...) to see how far it can go.
        [-]
        fancyfredbot 39 days ago
        Very interesting example. It's an insanely complex task even with a reference implementation in another language.
        It's surprising that it manages the majority of the test cases but not all of them. That's not a very human-like result. I would expect humans to be bimodal with some people getting stuck earlier and the rest completing everything. Fractal intelligence strikes again I guess?
        Do you think the way you specified the task at such a high level made it easier for Claude? I would have probably tried to be much more specific for example by translating on a file by file or function by function basis. But I've no idea if this is a good approach. I'm really tempted to try this now! Very inspiring.
        [-]
        simonw 39 days ago
        > Do you think the way you specified the task at such a high level made it easier for Claude?
        Absolutely. The trick I've found works best for these longer tasks is to give it an existing test suite and a goal to get those tests to pass, see also: https://simonwillison.net/2025/Dec/15/porting-justhtml/
        In this case ripping off the MicroQuickJS test suite was the big unlock.
        I have a WebAssembly runtime demo I need to publish where I used the WebAssembly specification itself, which it turns out has a comprehensive test suite built in as well.
      - kwertyoowiyop 39 days ago
        In the 80s, when the mouse was just becoming common, there was a study comparing programming using a mouse vs. just a keyboard. Programmers thought they were faster using a keyboard, but they were actually faster using a mouse.
        [-]
        logicprog 39 days ago
        That's the Ask Tog "study"[1]. It wasn't programmers, just regular users. The problem is he just says it, and of course Apple at the time of the Macintosh's development would have a strong motivation to prove mousing superior to keyboarding to skeptical users. Additionally, the experience level of the users was never specified.
        [1]: https://www.asktog.com/TOI/toi06KeyboardVMouse1.html
        harvey9 39 days ago
        This suprises me because at the time user interfaces were optimised for keyboard - the only input device most people had. Also screen resolutions were lower so there were fewer things you could click on anyway.
      - mrwrong 39 days ago
        [dead]
  - mossTechnician 39 days ago
    METR has some substantial AI industry ties, so I wonder if those clarifications (especially the one pointing at their own studies describing AI progress) are a way to mitigate concerns that industry would have with the apparent results of this study.
- Sharlin 39 days ago
  Plenty of people have been (too) quick to dismiss that study as not generally applicable because it was about highly experienced OSS devs rather than your average corporation programmer drone.
  [-]
  - _aavaa_ 39 days ago
    The issue I have with the paper is that it seems (based on my skimming) that they did not pick developers who were already versed with AI tooling. So they're comparing (experienced dev working in the way they're comfortable) vs (experienced dev working with new tool for the first time and not having passed the productivity slump from onboarding).
    [-]
    - pydry 39 days ago
      The thing I find interesting is that there is trillions of dollars in valuations hinging upon this question and yet the appetite to spend a little bit of money to repeat this study and then release the results publicly is apparently very low.
      It reminds me of global warming where on one side of the debate there some scientists with very little money running experiments and on the other side there were some ridiculously wealthy corporations publicly poking holes in those experiments but who secretly knew they were valid since the 1960s.
      [-]
      - Terr_ 39 days ago
        Yeah, it's kind of a Bayesian probability thing, where the impressiveness of either outcome depends on what we expected to happen by default.
        1. There are bajillions of dollars in incentives for a study declaring "Insane Improvements", so we should expect a bunch to finish being funded, launched, and released... Yet we don't see many.
        2. There is comparatively no money (and little fame) behind a study saying "This Is Hot Air", so even a few seem significant.
    - Sharlin 39 days ago
      Longitudinal studies are definitely needed, but of course at the time the research for this paper was done there weren't any programmers experienced with AI assist out there yet.
  - fancyfredbot 39 days ago
    That's interesting context for sure, but the fact these were experienced developers makes it all the more surprising that they didn't realise the LLM slowed them down.
    [-]
    - Sharlin 39 days ago
      Measuring programming productivity in general is notoriously difficult, subjectively measuring your own programming productivity is even worse. A magic LoC machine saying brrrrrt gives an overoptimistic sense of getting things done.
- MagicMoonlight 39 days ago
  I can believe it.
  It will zero-shot a full system for you in 5 minutes, but then if you ask for a minor change to that system it will completely shit the bed.
  And you have no understanding of what it has written, so you’d have to check everything.
orwin 39 days ago
If we take out most of frontend work, and the easy backend/Ops tasks where writing the code/config is 99% of the work, i think my overall productivity with the latest gen (basically Opus 4.5) improve by 15-20%. I also am _very_ sure that with the previous generation (Sonnet 4, sonnet 4.5, Codex 5.1), my team overall velocity decreased, even taking into account the frontend and the "easy" tasks. The amount of production bug we had to deal with this year is crazy. To much code is generated, and me and the other senior on my team just can't carefully review everything, we have to trust sometime (especially data structures).
The worse part is reading a PR, and catching a reintroduced bug that was fixed a few commit ago. The first time i almost lost my cool at work and said a negative thing to a coworker.
This would be my advice to juniors (and i mean basically: devs who don't yet understand the underlying business/architecture): use the AI to explain how stuff work, generate basic functions maybe, but write code logic/algorithm yourself until you are sure you understand what you're doing and why. Work and reflect on the data structures by yourself, even if generated by the AI, and ask for alternatives. Always ask for alternatives, it helps understanding. You might not see huge productivity gains from AI, but you will improve first, and then productivity will improve very fast, from your brain first, then from AI.
[-]
- mapontosevenths 39 days ago
  Just to add to your advice to juniors working with AI:
  * Force the AI to write tests for everything. Ensure those tests function. Writing boring unit tests used to be arduous. Now the machine can do it for you. There's no excuse for a code regression making it's way into a PR because you actually ran the tests before you did the commit, right? Right? RIGHT?
  * Force the AI to write documentation and properly comment code, then (this is the tricky part) you actually read what it said it was doing and ensure that this is what you wanted it to do before you commit.
  Just doing these two things will vastly improve the quality and prevent most of the dumb regressions that are common with AI generated code. Even if you're too busy/lazy to read every line of code the AI outputs just ensuring that it passes the tests and that the comments/docs describe the behavior you asked for will get you 90% of the way there.
  [-]
  - AnimalMuppet 39 days ago
    And, you actually wrote the regression test when you fixed the bug, right? Right?
    [-]
    - ponector 39 days ago
      I had a colleague, senior software developer with masters degree in CS who said: why should I write tests if I can write a new feature to close sprint scope faster?
      The irony is when company did lay off him due to covid the actual velocity of the team increased.
  - syspec 39 days ago
    Sometimes the AI is all too good at writing tests.
    I agree with the idea, I do it too, but you need to make sure the test don't just validate the incorrect behavior or that the code is not updated to pass the test in a way that actually "misses the point".
    I've had this happen to me on one or two tests every time
    [-]
    - aisisiiaai 39 days ago
      Even more important, those tests need to be useful. Often unit tests are simply testing the code works as written which is generally doing more harm than good.
      To give some further advice to juniors: if somebody is telling you writing unit tests is boring, they haven’t learned how to write good tests. There appears to be a large intersection between devs who think testing is a dull task and devs who see a self proclaimed speed up from AI. I don’t think this is a coincidence.
      Writing useful tests is just as important as writing app code, and should be reviewed with equal scrutiny.
    - mapontosevenths 39 days ago
      I agree 100%.
      For some reason Gemini seems to be worse at it than Claude lately. Since mostly moving to 3 I've had it go back and change the tests rather than fixing the bug on what seems to be a regular basis. It's like it's gotten smart enough to "cheat" more. You really do still have to pay attention that the tests are valid.
      [-]
      - spwa4 38 days ago
        Yep. It's incredibly annoying that obviously these AI companies are turning the "IQ knob" on these models up and down without warning or recourse. First OpenAI, then Anthropic and now Google. I'm guessing it's a cost optimization. OpenAI even said that part out loud.
        Of course, for customers it is just one more reason you need to be looking at every AI outputs. Just because they did something perfect yesterday doesn't mean they won't totally screw up the exact same thing today. Or you could say it's one more advantage of local models: you control the knobs.
- sokoloff 39 days ago
  > The worse part is reading a PR, and catching a reintroduced bug that was fixed a few commit ago. The first time i almost lost my cool at work and said a negative thing to a coworker.
  Losing your cool is never a good idea, but this is absolutely a time when you should give negative feedback to that coworker.
  Feedback is what reviews are for; in this case, this aspect of the feedback should neither be positive nor neutral.
- amtamt 39 days ago
  >> Kernighan's Law - Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.
  Now question is..
  is AI providing solutions smarter than the developer using it might have produced?
  And perhaps more importantly, How much time it takes AI to write code and human to debug it, even if both are producing equally smart solutions.
everdrive 39 days ago
A lot of the time, AI allows you to exercise basic competence at tasks for which you'd otherwise be incompetent. I think this is why it feels so powerful. You can jump into more or less any task below a certain level of complexity. (eg: you're not going to write an operating system with an LLM but you can set up and configure Wordpress if you'd never done it before.)
I think for users this _feels_ incredibly powerful, however this also has its own pitfalls: Any topic which you're incompetent at is one which you're also unequipped to successfully review.
I think there are some other productivity pitfalls for LLMs:
- Employees use it to give their boss emails / summaries / etc in the language and style their boss wants. This makes their boss happy, but doesn't actually modify productivity whatsoever since the exercise was a waste of time in the first place.
- Employees send more emails, and summarize more emails. They look busier, but they're not actually writing the emails or really reading them. The email volume has increased, however the emails themselves were probably a waste of time in the first place.
- There is more work to review all around and much of it is of poor quality.
I think these issues play a smaller part than some of the general issues raised (eg: poor quality code / lack of code reviews / etc.) but are still worth noting.
[-]
- AnimalMuppet 39 days ago
  It's like Excel: It's really powerful to enable someone who actually knows what needs done to build a little tool that does that thing. It often doesn't have to be professional-quality, let alone perfect. It just has to be better than doing the same thing manually. There are massive productivity gains to be had there... for people with that kind of problem.
  This is completely orthogonal to productivity gains for full time professional developers.
- iwontberude 39 days ago
  Writing an operating system with an LLM is a fantastic idea.
  [-]
  - MagicMoonlight 39 days ago
    I reckon it would do it well because they have ripped the entirety of GitHub, Linux etc.
    It basically just needs to recite things it has already seen.
- sph 39 days ago
  > AI allows you to exercise basic competence at tasks for which you'd otherwise be incompetent.
  Ah yes, the Dunning-Kruger as a service.
  [-]
  - MarkusQ 38 days ago
    I'm going to steal that line. Brilliant!
- jennyholzer3 39 days ago
  "There is more work to review all around and much of it is of poor quality."
  This is the average software developer's experience of LLMs
mashlol 39 days ago
AI almost always reduces the time from "I need to implement this feature" to "there is some code that implements this feature".
However in my experience, the issue with AI is the potential hidden cost down the road. We either have to:
1. Code review the AI generated code line by line to ensure it's exactly what you'd have produced yourself when it is generated or
2. Pay an unknown amount of tech tebt down the road when it inevitably wasn't what you'd have done yourself and it isn't extensible, scalable, well written code.
[-]
- kentm 39 days ago
  #2 is happening a lot more than people think. It’s incredibly hard to quantify tech debt in software and so as a result productivity measurements are pretty inaccurate. Even without AI there is a trend of devs writing a barely working system and then throwing it over the wall to “maintenance programmers”. Said devs are often rated highly by management as being productive compared to the “maintenance devs,” but all they really did was make other people deal with their garbage. I’ve seen these sorts of systems take months to years to be production ready while the original dev is already off to their new gig (and maybe cluelessly bragging on HN about how much better they are than the people cleaning up their mess).
  To get an accurate productivity metric you’d have to somehow quantify the debt and “interest” vs some alternative. I don’t think that’s possible to do, so we’re probably just going to keep digging deeper.
- jimbo808 39 days ago
  RE 2: It's not that far down the road either. Laziliy reviewed or unreviewed LLM code rapidly turns your codebase into an absolute mess that LLMs can't maintain either. Very quickly you find yourself with lots of redundant code and duplicated logic, random unused code that's called by other unused code that gets called inside a branch that only tests will trigger, stuff like that. Eventually LLMs start fixing the code that isn't used and then confidently report that they solved the problem, filling up a context window with redundant nonsense every prompt, so they can't get anywhere. Yolo AI coding is like the payday loan of tech debt.
  [-]
  - DougN7 39 days ago
    This can happen sooner than you think too. I asked for what I thought was a simple feature and the AI wrote and rewrote a number of times trying to get it right, and eventually (not making this up) it told me the file was corrupt and could I please restore it from backup. This happened within about 20-30 minutes of asking for the change.
  - jennyholzer3 39 days ago
    This is why I say LLMs are for idiots
- brightball 39 days ago
  Exactly. Optimizations in one area will simply move the bottleneck so in order to truly recognize gains you have to optimize the entire software pipeline.
  [-]
  - nradov 39 days ago
    Exactly right. It turns out that writing code is hardly ever the real bottleneck. People should spend some time learning the basics of queueing theory.
    http://lpd2.com/
- linsomniac 39 days ago
  >Code review the AI generated code line by line
  Have you considered having AI code review the AI code before giving them off to a human? I've been experimenting with having claude work on some code and commit it, and then having codex review the changes in the most recent git commit, then eyeballing the recommendations and either having codex work the changes, or giving them back to claude. That has seemed to be quite effective so far.
  Maybe it's turtles all the way down?
ezoe 39 days ago
If anyone ever wonder why they don't see productivity improvement, they really need to read Mythical Man-Month.
Garage Duo can out-compete corporate because there is less overhead. But Garage Duo can't possibly output the sheer amount of work matching with corporate.
[-]
- fancyfredbot 39 days ago
  In my view the reasons why LLMs may be less effective in a corporate environment is quite different from the human factors in mythical man month.
  I think that the reason LLMs don't work as well in a corporate environment with large codebases and complex business logic, but do work well in greenfield projects, is linked to the amount of context the agents can maintain.
  Many types of corporate overhead can be reduced using an LLM. Especially following "well meant but inefficient" process around JIRA tickets, testing evidence, code review, documentation etc.
  [-]
  - pigpop 39 days ago
    I've found that something very similar to those "inefficient" processes works incredibly well when applied to LLMs. All of those processes are designed to allow for seamless handoff to different people who may not be familiar with the project or code which is exactly what an LLM behaves like when you clear its context.
  - nradov 39 days ago
    The limited LLM context windows could be an argument in favor of a microservices architecture with each service or library in its own repository.
    [-]
    - jdlshore 39 days ago
      That just moves the complexity to the interactions between repositories, where it’s more difficult to understand and fix.
- kamaal 39 days ago
  >>there is less overhead.
  There have been methods to reduce overhead available over the history of our industry. Unfortunately almost all the times it involves using productive tools that would in some way reduce the head counts required to do large projects.
  The way this works is you eventually have to work with languages like Lisp, Perl, Prolog, and then some one comes up with a theory that programming must be optimised for the mostly beginners and power tooling must be avoided. Now you are forced to use verbose languages, writing, maintaining and troubleshooting take a lot of people.
  The thing is this time around, we have a way to make code by asking an AI tool questions. So you get the same effect but now with languages like JS and Python.
- jennyholzer3 39 days ago
  the productivity improvement is the Big Lie
jaredcwhite 39 days ago
If people need AI assistance to handle all their "boilerplate" all the time, the much larger problem is needing so much damn boilerplate written all the time.
The job of anyone developing an application framework, whether that's off the shelf or in-house, is to reduce the amount of boilerplate any individual developer needs to write to an absolute bare minimum. The ultimate win isn't to get "AI to write all your boilerplate." It's to not need to write boilerplate at all.
[-]
- joleyj 39 days ago
  I really don't mind boilerplate nearly as much as most people here on HN seem to. To me it's really no biggie if it helps structure things and make them explicit. I think it kind of goes along with the idea that typing code is not what takes the largest amount of time when you're doing software development. But the fact that I prefer explicit over implicit is another area where I think I diverge from the HN herd.
zihotki 39 days ago
Sounds like AI slopish article. A whole section about "Why most enterprises don't" with many words but no actual data or analysis. Just assumptions based on orthogonal report.
AI won't give you much productivity if the problem you're challenged with is the human problem. That could happen both to startups and enterprises.
chiengineer 39 days ago
Lets give 99% of the company devices with 16gb of ram or less and force them to use 85% of it for security scans
- corporate
WHY CANT OUR DEVICES RUN TECHNOLOGIES ??????
- also corporate
[-]
- hn-acct 39 days ago
  Actually though. We had one device that was over 10 years older without any MDM etc. and it outperformed a new laptop building the same product because of the corporate anti virus crap.
  [-]
  - HappySweeney 39 days ago
    If you don't exclude your build folders from the scan it will slow everything down tremendously.
turlockmike 39 days ago
When producing code is cheap, you can spend more time on verification testing.
Force the LLM to follow a workflow, have it do TDD, use task lists, have it write implementation plans.
LLMs are great coders, but subpar developers, help them be a good developer and you will see massive returns.
[-]
- pydry 39 days ago
  Coz I have always done coding this way with humans I started out using LLMs to do simple bits of refactoring where tests could be used to validate that the output still worked.
  I did not get the impression from this that LLMs were great coders. They would frequently miss stuff, make mistakes and often just ignore the instructions i gave them.
  Sometimes they would get it right but not enough. The agentic coding loop still slowed me down overall. Perhaps if i were more junior it would have been a net boost.
mgrat 39 days ago
I've worked at a number of non-tech companies the past few years. They bought every SaaS product, Palantir, Databricks, multi-cloud, their dev teams adopted every pattern popularized by big tech and the results were always mixed. Any gains were wiped out by being buried under technical debt. They had all the data catalogs & 'ontologies' with none of the governance to go make it work. Turns out that benefiting from all this tech requires you to re-organize and change your culture. For a lot of companies, they're just not going to see big gains from AI or tech in general at this point.
hiyer 38 days ago
In my experience, this kind of productivity can only be seen in small startups or, if in a large company, on an entirely new product line when processes like test coverage, reviews, etc are lax. In large firms and existing code-bases, it can take weeks to even get the approach decided. Even once decided, any pull request larger than a few dozen lines will get shot down. Things are even worse if you're working across time zones because the to-and-fro on pull requests takes several days to be completed.
bulletsvshumans 39 days ago
I think coding agents require fundamentally different development practices in order to produce efficiency improvements. And just like any new tool, they benefit from wisdom in how they are applied, which we are just starting to develop as an industry. I expect that over time we will grow to understand and also expand the circumstances in which they are a net benefit, while also appreciating where they are a hindrance, leading to an overall efficiency increase as we avoid the productivity hit resulting from their misapplication.
neilwilson 39 days ago
What the AI speed increase on Greenfield projects with modern stacks does do is reduce the cost of replacement.
Expect to see more “replace rather than repair” projects springing up
keeda 39 days ago
TFA is directionally correct, though it repeats a few cliches which are no longer accurate. E.g. people and some empirical data report improved productivity even on large, brownfield codebases, with the caveat that effectiveness seems to be related more to the quality processes around the code rather than the code itself.
However, this TFA is absolutely correct about the point that it takes a long time to master this technology.
A second, related point is that the users have to adapt themselves to the technology to fully harness it. This is the hardest part. As an example, after writing OO code for my entire career, I use a much more of a functional programming style these days because that's what gets the best results from AI for me.
In fact, if you look at how the most effective users of AI agents do coding, it is nothing like what we are used to. It's more like a microcosm of all the activities that happen around coding -- planning, research, discussions, design, testing, review, etc -- rather than the coding itself. The closest analogy I can think of is the workstyle of senior / staff engineers working with junior team members.
Similarly, organizations will have to rethink their workflows and processes from the ground-up to fully leverage AI. As a trivial example, tasks that used to take days and meetings can now take minutes, but will require much more careful review. So we need support for the humans-in-the-loop to do this efficiently and effectively, e.g. being able to quickly access all the inputs that went into the AI's work product, and spot-check them or run custom validations. This kind of infra would be specific to each type of task and doesn't exist yet and needs to be built.
Just foisting a chatbot on employees is not helpful at all, especially as a top-down mandate with no guidance or training or dedicated time to experiment AND empowerment to shake things up. Without that you will mostly get poor results and resentment against AI, which we are already seeing.
It's only 3 years since ChatGPT was released, so it is still very early days. Given how slow most organizations move, I'm actually surprised that any of them are reporting positive results this early.
nen-nomad 39 days ago
Claude Code with Opus models has definitely reduced our TTM. It took us some time to build processes around it. It freed our resources to focus on tasks such as crafting better user journeys and marketing plans.
One thing I am not sure about is the debt we are accumulating by allowing AI agents to write and maintain the code. In the short term, it is boosting our speed, but in the long run, we may suffer.
But the product works well, and our users are happy with the experience.
I have been a programmer for three long decades, so I have mixed feelings about this. But some days I see the writing on the wall.
[-]
- Denzel 39 days ago
  Asking as an eng that's starting to drive daily with CC:
  - How much has your TTM reduce by? How did you measure?
  - What's the net difference when you factor in token spend expenses?
  - By how much can Anthropic increase prices before crossing over your break-even point?
robomartin 39 days ago
The key issue is that the current version of AI has no concept of understanding anything. Without understanding anything is possible and bad outcomes are almost guaranteed outside of the trivial. Throw a non-trivial codebase at any AI tool and watch as it utterly destroys it, introduces lots of new bugs, add massive amounts of bloat and, in general, makes it incomprehensible and impossible to support.
I ran a three month experiment with two of our projects, one Django and the other embedded C and ARM assembler. You start with "oh wow, that's cool!" and not too long after that you end up in hell. I used both ChatGPT and Cursor for this.
The only way to use LLMs effectively was to carefully select small chunks of code to work on, have it write the code and then manually integrate into the codebase after carefully checking it and ensuring it didn't want to destroy 10 other files. It other words, use a very tight leash.
I'm about to run a six month LLM experiment now. This time it will be Verilog FPGA code (starting with an existing project). We'll see how that goes.
My conclusion at this instant in time is that LLMs are useful if you are knowledgeable and capable in the domain they are being applied to. If you are not, shit show potential is high.
tstrimple 38 days ago
I work in consulting and have a good look at how a few very large players are trying to implement GenAI. They literally have zero clue what they are doing providing gimped access to sandboxed models approved by legal. So many are just using Copilot that they are already paying for to some degree. Out of the dozen or so large organizations adopting GenAI that I'm at least tangentially involved in one way or another, not a single one is using tools like Codex or Claude Code for development.
sys_64738 39 days ago
AI code output is garbage which needs to be combed through by real programmers. That's the problem. AI will dumb down the quality of what a real programmer is.
[-]
- sph 39 days ago
  What? AI will dumb down the average programmer, while making the “real” ones that know how to use their head, and not throw more code at the problem, even more valuable.
  That said, I don’t think people are writing rocket science if boilerplate code is the bottleneck that you need to automate that part. I have said time and time again, my worth as an engineer has little to do with how much lines of code I churn out in a day, but rather all the hard thinking that done beforehand.
aisisiiaai 39 days ago
A key point missing from a lot of the AI debate is how much work is useless. From as simple as a feature that’s never turned on to a more extreme version of a job that doesn’t need to exist.
We have a lot of useless work being done, and AI is absolutely going to be a 10x speed up for this kind of work.
[-]
- ath3nd 39 days ago
  [dead]
mattas 39 days ago
In my experience, it’s basically impossible to accurately measure productivity of knowledge work. Whenever I see a stat associated to productivity gain/loss I get skeptical.
If you go the pure subjective route, I’ve found that people conflate “speed” or “productivity” with “ease.”
[-]
- jdlshore 39 days ago
  The METR study has a unique approach to measuring productivity. It’s the only one I’ve seen that holds water.
  I don’t think I can do the approach justice here, but the short version is that they have the developer estimate how long a change will take, then randomly assign that task to be completed with AI or normally and measure how long it actually takes. Afterwards, they compare the differences in the ratios of estimates to actuals.
  This gets around the problem of developer estimates being inaccurate by comparing ratios.
nineteen999 39 days ago
Think this is a stupid point in time to be measuring any changes in productivity with anything other than wide-eyed interest. The tools may have improved a lot over the past year but they are still embryonic.
Way too early to be jumping to any conclusions about this IMHO.
scuff3d 39 days ago
Something not mentioned by the article is that the gains seen by "AI native startups" and green field projects are going to be paid for later. Anyone who's worked in software for a bit will tell you that last 20% of a project is a bitch, and it's gonna be worse when you don't understand how anything actually works.
Interestingly, I've worked both ends of the spectrum simultaneously over the last year. I've spent most of my time on a (mostly) legacy system we're adding capabilities too, and I've spent some over time working on an R&D project for my company. In the first, AI had been of limited use. Mostly good for generating helper scripts and data generators, stuff where I don't care and just need a couple hundred lines of code. In the R&D project on the other hand we probably got a years worth of work done in 2 months, but I can already see the problems. We are working in a space none of us are experts in and with a complex library we don't understand. AI got us to a demo of an MVP way quicker then we could have ourself, but actually transitioning that to something useful is going to be a LOT of work.
ukuina 39 days ago
This article simply reinforces existing (and outdated) biases.
Complex legacy refactoring + Systems with poor documentation or unusual patterns + Architectural decisions requiring deep context: These go hand in hand. LLMs are really good at pulling these older systems apart, documenting, then refactoring them, tests and all. Exacerbated by poor documentation of domain expectations. Get your experts in a room weekly and record their rambling ideas and history of the system. Synthesize with an LLM against existing codebase. You'll get to 80% system comprehension in a matter of months.
Novel problem-solving with high stakes: This is the true bottleneck, and where engineers can shine. Risk assessment and recombination of ideas, with rapid prototyping.
KevinMS 38 days ago
Have people figured out that coding isn't the hard part yet?
pdyc 38 days ago
i think developers "feel" like it because it reduces cognitive overhead but does not necessarily leads to more output in given time. Than there are areas like scaffolding new project, repetitive work which does leads to real gain but that would not lead to dramatic changes in productivity.
linsomniac 39 days ago
>The AI fluency tax. This isn't free to learn.
In programming we've often embraced spending time to learn new tools. The AI tools are just another set of tools, and they're rapidly changing as well.
I've been experimenting seriously with the tools for ~3 years now, and I'm still learning a lot about their use. Just this past weekend I started using a whole new workflow, and it one-shotted building a PWA that implements a fully-featured calorie tracking app (with social features, pre-populating foods from online databases, weight tracking and graphing, avatars, it's on par with many I've used in the past that cost $30+/year).
Someone just starting out at chat.openai.com isn't going to get close to this. You absolutely have to spend time learning the tooling for it to be at all effective.
[-]
- lelele 38 days ago
  Would you mind sharing your current workflow?
  For example, do you begin with a rough design and refine it into concrete steps with the AI, or take another approach? Do you switch models based on task complexity to manage costs?
josefritzishere 39 days ago
I think AI would have better general acceptance if we stopped mythologizing it's utility. It's so wildly over exaggerated it can't ever live up to the hype. If AI can't adapt to a reality-based universe, the bubble is going to burst all the sooner.
nospice 39 days ago
Another day, another evidently AI-written article about AI on the front page of HN...
[-]
- hackeman300 39 days ago
  Yup, closed as soon as I saw the classic "it's not x, it's y" pattern.
- fancyfredbot 39 days ago
  Does it really matter? The article definitely shows signs of being LLM assisted but it doesn't read like pure slop to me. It reads more like the author used an LLM to summarise his thoughts.
  [-]
  - nospice 39 days ago
    If it feels like I'm reading a human essay that received some help from a copyeditor, no. If it feels like I'm reading the output of "hey LLM, write a contrarian take on the use of AI in business"... yeah, I think it matters, because it shifts the balance toward effortless production of endless HN bait.
jennyholzer3 39 days ago
I think you'd have to be stupid to expect productivity gains from your software developers using LLMs
edit: a lot of articles like this have been popping up recently to say "LLMs aren't as good as we hyped them up to be, but they still increase developer productivity by 10-15%".
I think that is a big lie.
I do not think LLMs have been shown to increase developer productivity in any capacity.
Frankly, I think LLMs drastically degrade developer performance.
LLMs make people stupider.
[-]
- mapontosevenths 39 days ago
  The thing you're likely missing is that you've forgotten what programming is at a high level.
  A program is a series of instructions that tell a computer how to perform a task. The specifics of the language aren't as important as the ability to use them to get the machine to perform the tasks instructed.
  We can now use English as that language, which allows more people than ever to program. English isn't as expressive as Python wielded by an expert, yet. It will be. This is bad for people who used to leverage the difficulty of the task to their own advantage, but good for everyone else.
  Also, keep in mind that todays LLM's are the worst they'll ever be. They will continue to improve, and you will stagnate if you don't learn to use the new tools effectively.
  [-]
  - dustbunny 39 days ago
    > you will stagnate if you don't learn to use the new tools effectively
    I've been going the other way, learning the old tools, the old algorithms. Specifically teaching myself graphics and mastering the C language. Tons of new grads know how to use Unity, how many know how to throw triangles directly onto the GPU at the theoretical limit of performance? Not many!
    [-]
    - mapontosevenths 39 days ago
      I did some of that when I was younger. I started with assembly and C, even though everyone told me to skip it and start with at least C++ or something further up the abstraction ladder. Ignoring them and gaining that knowledge has proven invaluable over the years.
      Understanding a "deeper" abstraction layer is almost always to your advantage, even if you seldom use it in your career. It just gives you a glimpse behind the curtain.
      That said, you have to also learn the new tools unless you tend to be a one man band. You'll find that employers don't want esoteric knowledge or all-knowing wizards who can see the matrix. Mostly, they just want a team member who can cooperate with other folks to get things done in whatever tool they can find enough skilled folks to use.
    - jennyholzer3 39 days ago
      I think this guy is smarter than every LLM user in the thread
  - coffeefirst 39 days ago
    > you will stagnate if you don't learn to use the new tools effectively.
    This is the first technology in my career where the promoters feel the need to threaten everyone who expresses any sort of criticism, skepticism, or experience to the contrary.
    It is very odd. I do not care for it.
    [-]
    - mapontosevenths 39 days ago
      How old is your career then? I've been hearing some variation on "evolve or die" for about 30 years now, and it's been true every time... Except for COBOL. Some of those guys are still doing the same thing they were back then. Literally everything else has changed and the people that didn't keep up are gone.
      [-]
      - linuxrocks123 29 days ago
        [dead]
    - jennyholzer3 39 days ago
      "you will stagnate if you don't learn to use the new tools effectively."
      this hostile marketing scheme is the reason for my hostile opposition to LLMs and LLM idiots.
      LLMs do not make you smarter or a more effective developer.
      You are a sucker if you buy into the hype.
      [-]
      - mapontosevenths 39 days ago
        Are you arguing that you can work in technology without learning new things?
        Have you considered a career in plumbing? Their technology moves at a much slower rate and does not require you to learn new things.
        [-]
        coffeefirst 39 days ago
        No... nobody has ever argued that.
        There's a debate to be had about what any given new technology is good for and how to use it because they all market themselves as the best thing since sliced bread. Fine. I use Sonnet all the time as a research tool, it's kind of great. I've also tried lots of stuff that doesn't work.
        But the attitude towards everyone who isn't an AI MAXIMALIST does not persuade anyone or contribute to this debate in any useful way.
        Anyway if I get kicked out of the industry for being a heretic I think I'll go open an Italian restaurant. That could be fun.
        [-]
        mapontosevenths 39 days ago
        > There's a debate to be had about what any given new technology is good for and how to use it
        Fair enough. It's reasonable to debate it, and I'll agree that it's almost certainly overhyped at the moment.
        That said, folks like the GP who say that "LLMs do not make you smarter or a more effective developer" are just plain wrong. They've either never used a decent one, or have never learned to use one effectively and they're blaming the tool instead of learning.
        I know people with ZERO programming experience who have produced working code that they use every day. They literally went from 0% effective to 100% effective. Arguing that it didn't happen for them (and the thousands of others just like them) is just factually incorrect. It's not even debatable to anyone who is being honest with themselves.
        It's fair to say that if you're already a senior dev it doesn't make you super-dev™, but I doubt anyone is claiming that. For "real devs" they're claiming relatively modest improvements, and those are very real.
        > Anyway if I get kicked out of the industry for being a heretic I think I'll go open an Italian restaurant.
        I doubt anyone will kick you out for having a differing opinion. They'll more likely kick you out for being less productive than the folks who learned to use the new tools effectively.
        Either way, the world can always use another Italian restaurant, or another plumber. :)
        jennyholzer3 39 days ago
        I'm arguing that LLMs are overhyped garbage which frankly seem like a dead end for someone pursuing a career in software development
- amelius 39 days ago
  I've seen interns (with academic background) build advanced UIs for projects while not having a background in coding. This would not have been possible without LLMs.
  [-]
  - jennyholzer3 39 days ago
    Can they do it without LLMs?
    If they can't, did they really do it in the first place?
    Are they actually literate in the programming languages they're using?
    [-]
    - koakuma-chan 39 days ago
      I don't write any front-end code at work anymore. I use Figma MCP and Cursor and it can implement the design near perfectly on first try.
      [-]
      - deaux 39 days ago
        To be fair, this is presumably because a skilled human spends time properly making the design for you in Figma.
        [-]
        koakuma-chan 39 days ago
        It doesn't really matter how "skilled" the designer is. Figma's MCP already provides HTML and CSS that's basically ready, and all AI needs to do is translate that into React or whatever. Or if you mean that AI wouldn't be able to make a proper interface without the human, that's also not true. The only reason I use Figma MCP is that my company uses Figma and has a dedicated Figma person. My opinion is that is just a bottleneck, and it would be easier to prompt AI to make whatever interface.
        [-]
        deaux 39 days ago
        > The only reason I use Figma MCP is that my company uses Figma and has a dedicated Figma person. My opinion is that is just a bottleneck, and it would be easier to prompt AI to make whatever interface.
        Here's where our opinions differ - I think replacing that Figma person with AI prompts will negatively affect product in a way that is noticeable to the end-user and effects their experience.
        It does of course depend what kind of product you're making, but I'd say most of the time this holds.
        [-]
        koakuma-chan 39 days ago
        I'm not even arguing that you should replace the Figma person with AI. I am arguing that even without AI, having Figma persons is a bottleneck. It is much faster to just use some kind of component library, like shadcn, and let the developer figure it out. And with AI that would be even faster, as the developer wouldn't have to write any code, just check the AI output and prompt to make changes if needed. Unless of course you need one of those really fancy landing pages, but even then, you would likely need a specialized developer, and not a Figma person.
        [-]
        deaux 39 days ago
        If you work in B2B SaaS, sure, I guess. That's a lot of HN by virtue of being a lot of SF VC, but only a tiny part of all tech. Elsewhere shadcn isn't a realistic option.
        [-]
        koakuma-chan 39 days ago
        I'm curios, where is "elsewhere"?
        [-]
        deaux 39 days ago
        ..literally everything that isn't a recent, up and coming B2B SaaS. So >90% of the software written today.
        To give but one example, effectively all of the >$300B mobile app market. Or all enterprise software that can't run on Electron. Or any company that cares about image/branding across their products, which is every single company past a certain size (and don't come at me with "but hot AI startup uses Shadcn and are valued at X trillion").
        [-]
        koakuma-chan 38 days ago
        I will come at you at say that >90% of software written today is garbage, and >90% of companies are run by incompetent people. My hypothesis is that is the reason why we "need" Figma persons and project managers.
      - sarchertech 38 days ago
        My mom could do that with dreamweaver 25 years ago.
    - _aavaa_ 39 days ago
      This is such a tired argument.
      Could people write scientific code without python? If they can't, did they really do it in the first place?
      Could people write code without use after free bugs without using a GC'd language? If they can't, did they really do it in the first place?
      Could people make a website without WYSIWYG editor? If they can't, did they really make a website?
      [-]
      - jennyholzer3 39 days ago
        I think LLMs have aggressively facilitated the rise of illiteracy in people attending software development university programs.
        I think graduates of these programs are far, far worse software developers than they were in the recent past.
        edit: i think you mean "irrelevant", not "irreverent". that being said, my response is an expansion of the point made in my comment that you replied to.
        [-]
        amelius 39 days ago
        > I think LLMs have aggressively facilitated the rise of illiteracy in people attending software development university programs.
        But this subthread is about interns who did not study CS, and are able to create advanced UIs using LLMs in the short time they had left to finish their project.
        _aavaa_ 39 days ago
        I'll start by saying that this seems irreverent to my previous comment.
        That being said, I half agree but I think we see things differently. Based on what I've seen, the "illiterate" are those who would have otherwise dropped out or done a poor job previously. Now instead of exiting the field, or slowly shipping code they didn't understand (because that has always been a thing) they are shovelling more slop.
        That's a problem, but it's at most gotten worse rather than come out of thin air.
        But, there are still competent software engineers and I have seen with my own eyes how AI usage makes them more productive.
        Similarly, some of those "illiterate" are those who now have the ability to make small apps for themselves to solve a problem they would not be able to before, and I argue that's a good thing.
        Ultimately, people care about the solution to their problems, not the code. If (following the original anecdote) someone with an LLM can build a UI for their project I frankly don't think it matters whether they understood the code. The UI is there, it works, and they can get one with the thing that is actually important: using the UI for their bigger goal.
    - patmorgan23 39 days ago
      Does it matter?
- alooPotato 39 days ago
  Do you think IDE's, type checking, refactoring tools and autocomplete make developers stupider too? Serious question.
  [-]
  - jennyholzer3 39 days ago
    not at all, I think these are valuable tools
    would you agree that LLMs make developer stupider?
    edit: answer my question
    [-]
    - alooPotato 39 days ago
      So what about Cursor's tab autocomplete? Seems like there is a spectrum of tooling between raw assembly all the way to vibe coding and I'm trying to see where you draw the line. Is it "if it uses AI, its bad" or are you more against the "hey build me something and I'm not even gonna check the results."
      [-]
      - peteforde 39 days ago
        ... they are not going to give you a satisfying answer to your totally reasonable line of inquiry.
        Looking at the brief history of their account, I don't think anything they are saying or asking is in remotely good faith.
- simonw 39 days ago
  Can you describe anything about the difference between using ChatGPT with GPT-4o to write code in January 2025 and using Opus 4.5 with Claude Code in December 2025?
  [-]
  - jennyholzer3 39 days ago
    [flagged]
    [-]
    - peteforde 39 days ago
      This isn't the smart, snappy reply that you believe it to be.
      As a comment reader this exchange with Simon translates directly to "no, but you have forced me to try and misdirect because I can't reply in good faith to an expert who has forgotten more about LLMs than I'll ever know".
      [-]
      - jennyholzer3 39 days ago
        what reads to you as "an expert who has forgotten more about LLMs than I'll ever know" reads to me as a crack cocaine smoker.
        just write the code
        [-]
        pigpop 39 days ago
        The only person coming off as unhinged and out of touch with reality here is you.
    - simonw 39 days ago
      Yeah, I don't think you've used any of the technology you are criticizing here.
- ookblah 39 days ago
  well you don't have to "think" anything lol, if you've never tried it yourself that's step one. step two is to not assume everyone out there is a shill or lying because that's awful convenient. also its not black or white.
  developers can exist in a small team, solo, large enterprise all with their mandates and cultures so just saying LLMs increase/decrease is reductive.
  have a feeling i'm being trolled tho.
  [-]
  - jennyholzer3 39 days ago
    I'm not trolling, this is my sincere opinion
    I think LLM addicts are particularly susceptible to flattery.
    [-]
    - CuriouslyC 39 days ago
      You're alienating a lot of people just by calling them "LLM addicts," and I suspect you're not arguing in good faith given your language.
      There are a lot of sad people who have developed parasocial relationships with ChatGPT, but that's entirely a separate issue from whether agents are a good tool for software engineering.
    - peteforde 39 days ago
      "Real" woodworkers often have similar reactions when they see people incorporating CNC machines into their creative process as you appear to have when it comes to your engagement style on this topic.
      They don't emerge looking credible, either.
      [-]
      - jennyholzer3 39 days ago
        equating a CNC router with Claude Code or Chat GPT is an egregious false equivalency.