Karpathy on Programming: “I've never felt this much behind”

(twitter.com)

549 points | by rishabhaiover 42 days ago

82 comments

brandonmenc 39 days ago
I admit to pangs of this, but it's really never made any sense because the implication is that the profession is now magically closed off to newcomers.
Imagine someone in the 90s saying "if you don't master the web NOW you will be forever behind!" and yet 20 years later kids who weren't even born then are building web apps and frameworks.
Waiting for it to all shake out and "mastering" it then is still a strategy. The only thing you'll sacrifice is an AI funding lottery ticket.
[-]
- yoyohello13 39 days ago
  Finally a voice of reason. The tools will just get better and easier to use. I use LLMs now, but I'm not going to dump a bunch of time learning the new hotness. I'll let other people do that and pickup the useful pieces later.
  Unless your gunning for a top position as a vibe coder, this whole concept of "falling behind" is just pure FOMO.
  [-]
  - estimator7292 39 days ago
    Same. I only just started using agents a few months ago.
    Earlier this year the ecosystem was still a mess I didn't have time to untangle. Now things are relatively streamlined and simple. Arguably stable, even.
    I feel behind, sure, but I also don't think people on the bleeding edge are getting that much more utility that it's worth sinking dozens or hundreds of my very limited hours into understanding.
    Besides, I'm a C programmer. I'll always be several decades behind the trend. I'm fine with that.
  - Ekaros 39 days ago
    Doing small project for customer. They have explicit instructions that I can't even use some unapproved AI... So well they are paying. So until it is actually forced I see no pressure to move there.
    And rest of my field. Automated tools do part of work. AI probably some, but not enough of actually verifying findings and then properly explaining the context and implications.
  - hruhbuikr 38 days ago
    Yeah Karpathy is engaged here in more hype creation. Software engineers pretending they just smashed some particles together and there is a whole lot of new data to math out.
    It's high dose copium. Please keep the good times rolling! Buy my books! Sub to my stack!
    Meanwhile, with local models, local RAG, and shell scripts, I am wandering 3D immersive worlds via a GPU accelerated presentation layer I vibe coded with a single 24GB GPU. Natural language driven Unreal engines are viable outputs today given local only code gen.
    Karpathy and the SV VC world thought this would be the next big thing to pump for a decade plus; like web pages and SaaS. But the world is smarter, more adept at catching up that it is just state management in a typical machine. The semantics are well known and do not need re-invention.
    The hilarity at an entire industry unintentionally training their replacements.
    [-]
    - HellDunkel 31 days ago
      >> Meanwhile, with local models, local RAG, and shell scripts, I am wandering 3D immersive worlds via a GPU accelerated presentation layer I vibe coded with a single 24GB GPU. Natural language driven Unreal engines are viable outputs today given local only code gen.
      what drugs are you using?
    - 3836293648 38 days ago
      What on Earth have you used to get reasonable results out of a local model?
      I've tried at every new model release (that can run on my 24GB card) and everything is still entirely useless.
      I'm not writing web stuff though.
  - IshKebab 39 days ago
    Yeah that's my view too. It's definitely fine to wait a couple of years (at least), and see what emerged as most effective and then just learn that, instead of dumping a ton of time now into keeping up with the hamster wheel.
    Unless you're in web dev because it seems like that's one of the few domains where AI actually works pretty well today.
    [-]
    - antupis 39 days ago
      Or if you like learning new stuff. Personally that has been best part of being programmer.
      [-]
      - yoyohello13 39 days ago
        I love learning new stuff, but for whatever reason the AI stuff doesn’t interest me. So I learn other stuff, only so much time in the day.
      - IshKebab 39 days ago
        I like learning new stuff, but not if it's going to be completely obsolete in 6 months.
  - jacquesm 39 days ago
    > Unless your gunning for a top position as a vibe coder, this whole concept of "falling behind" is just pure FOMO.
    ???
    [-]
    - kashyapc 39 days ago
      The person you're quoting has a point. Everyone is losing their minds about this. Not everyone needs to be on top of AI developmemts all the time. I don't mean you ignore LLMs, just don't chase every fad.
      The classic line (which I've quoted a few times here) by Charles Mackay from 1841 comes to mind:
      "Men, it has been well said, think in herds; it will be seen that they go mad in herds, while they only recover their senses slowly, and one by one.
      "[...] In reading The History of Nations, we find that, like individuals, they have their whims and their peculiarities, their seasons of excitement and recklessness, when they care not what they do. We find that whole communities suddenly fix their minds upon one object and go mad in its pursuit; that millions of people become simultaneously impressed with one delusion, and run after it, till their attention is caught by some new folly more captivating than the first."
      — Extraordinary Popular Delusions and the Madness of Crowds
      [-]
      - jacquesm 39 days ago
        Thank you for the subtitles, it's not like I didn't understand the lingo, I just couldn't make sense of the implied meaning.
- zerr 39 days ago
  And here I am partying (coding) like it's 90s (C++ desktop apps) and web never happened... :)
  [-]
  - estimator7292 39 days ago
    It's pretty nice that C has garnered such hate because there's apparently very little focus on getting LLMs to write good C. It's all Rust and Python and whatever this month's fad language is. LLM fans mostly leave us alone apart from the "C bad rewrite the world in rust" crew.
    I'm very happy being decades behind the curve here. C's slowness is perfect for me.
  - atonse 39 days ago
    You’re the real unicorn!
- senordevnyc 39 days ago
  Eh, for myself as a middle-aged software engineer, it feels a little like the last chopper out of Saigon. I feel less and less confident that I can make as good a living in software for the next decade as I have for the last couple. Or if I want to. The job is changing so fast right now, and I’m not sure I like it. When I worked in big tech, I preferred being an IC over an EM or tech lead because I like writing code. Now it feels increasingly like you can’t be an IC in that way anymore. You’re now coding through others, either humans or AI.
  Sure, I can write code manually, but in my case I’m working full time on my own SaaS and I am absolutely faster and more effective with AI. It’s not even close. And the gains are so extreme that I can’t justify writing beautiful hand-crafted artisanal code anymore. It turns out that code that’s “good enough” will do, and that’s all I can afford right now.
  But long-term, I don’t know that I want to do that work, especially for some corporation. It feels like the difference between being a master furniture craftsman, and then going work in an IKEA factory.
  [-]
  - jacquesm 39 days ago
    You'll make more money than ever cleaning up AI generated messes.
    [-]
    - pzo 39 days ago
      I had few projects like that this year and I can say it how messy and demotivating its to cleaning up mess.
      And its actually not well paid because client now has the expectation that mostly everything is now done, you have to just only fix few things and you even have AI at your disposal so expect that you just write a better magic prompt.
      I think actually often its faster and cheaper to start from scratch or at least rewrite whole module (of course still with AI with just better vibe engineering rather than vibe coding).
      It's similar with house renovation - often its just cheaper and faster to tear whole building down rather than fixing it.
      [-]
      - karmakurtisaani 39 days ago
        Would you be able to share any more details on the clean up projects you had to do? Like, wasn't front or back end, which tech stack, where were the LLM code issues etc.
        I'm just very curious where we are at the moment with in this profession.
        [-]
        pzo 36 days ago
        the project was iOS app and vibe coded in Claude Code - it was around half year ago so maybe things improved. Client actually knew some coding so actually quite impressive how far they did manage to go along.
        However it was just adding pile of feature after features without taking time to refactor it. Client most likely did some few different attempts to add some specific feature or fixing something and there was a lot of dead code that haven't been used. This dead code actually confused AI and often tried to modify part of code that have been abandoned.
        There was completely no tests. No performance tests. And some part of my job was to improve performance (cv/ai model inference) and robustness (crashes, memory leaks).
        I think AI is fine and useful but whats bad with such vibe coded project if somebody hand over to you is you have completely no clue what part of the code are written/designed properly with good foundation if previous developer didn't test extensively and didn't refactor continuously. Even worse if you cannot talk to previous developer responsible for the project.
        jakeydus 39 days ago
        Not OP, but I’ve spent months cleaning up sqlalchemy models that were written in isolation using AI. Project was just not scalable.
    - senordevnyc 39 days ago
      First, I’m highly skeptical of that, especially over the course of the next decade.
      Second, do you actually want to do that work? I don’t. I spent years working as a freelancer and I cleaned up a lot of shitty code from other freelancers. Not really what I want to spend my 50s doing.
      [-]
      - jacquesm 38 days ago
        Depends on what it pays. Follow the screaming.
    - varjag 39 days ago
      I greatly respect your opinions here but I really doubt that would ever happen.
      [-]
      - jacquesm 39 days ago
        It's already happening. My buddies are in the 'late bloom' phase of their careers and they are doing quite well as of late.
        AI supported coding is like four wheel drive: it will get you stuck but in harder places. The people that use these tools to reach above the level of their actual understanding are creating some very expensive problems. If you're an expert level coder and you use AI to speed up the drudgework you can get good mileage out of them, but if you're a junior pretending to be a senior you're about to cost your employer a lot of $ hiring an actual senior.
        [-]
        sod22 39 days ago
        One thing I’ve noticed is that some folks are over-confident about the benefits of LLM’s and seemingly gloss over the implicit costs.
        And for good reason - the ill disciplined human body optimises for short term benefits. The disciplined body recognises the flaw in this and thinks much broader.
        atonse 39 days ago
        But wouldn’t the models get better at fixing complicated code eventually?
        [-]
        kloop 39 days ago
        We don't know. We seem to be hitting diminishing returns, but we don't exactly know where it will stop
        [-]
        aspenmartin 37 days ago
        Is there a source for this? Scaling laws work and we have about 4 orders of magnitude in the exponential growth before we run into true bottlenecks
    - kyyt 39 days ago
      [dead]
  - SoftTalker 39 days ago
    What I like to say is that writing software is getting so easy that I don't know how to do it anymore.
- causal 39 days ago
  If anything I'd expect all these tools to be easier for new engineers to adopt, unburdened by how things were before.
  [-]
  - reidrac 39 days ago
    > unburdened by how things were before.
    What burden are you talking about? Using LLMs isn't that hard, we have done harder things before.
    Sure, there will be people that refuses to "let go" and want to keep doing things the way the like them, but hey! I've been productive with vim (now neovim) for 25 years and I work with engineers that haven't mastered their IDEs at the same level. Not even close!
    Sure, they have have never been "burdened" by knowing other editors before those IDEs existed, but claiming that I would have it harder to use any of those because I've mastered other tools before is ridiculous.
    [-]
    - causal 39 days ago
      Not sure how to address this without just restating TFA. Not all change builds on existing knowledge, and sometimes it is so rapid that keeping up is difficult.
- danw1979 39 days ago
  Absolutely agree.
  I took this approach when the Kubernetes hype hit and it never limited my prospects.
- constantcrying 38 days ago
  This argument only makes any sense at all because the demand for software developers continually grew.
  As long as more software developers are needed your logic obviously holds, it is irrelevant whether you are a master. There are enough jobs for "good enough". But what if "good enough" is no longer a viable economic niche? Since that niche is now entirely occupied by LLMs.
- SoftTalker 39 days ago
  People did say that in the 90s. Hence the rush to put everything on the web, whether there was any real business case for it or not. And most of it went up in flames at the end of that decade.
- smrtinsert 38 days ago
  Tell that to recruiters! If you're senior you're always expected to know everything.
hamstergene 39 days ago
I feel like many people in the comments aren't aware that Karpathy is an ML scientist for whom programming is a complementary skill, not a profession. The only reason he came up with "vibe coding" is because maximum complexity of his hobby projects made it seem believable. Maybe take his opinions about fate of programming with a grain of salt.
He is brilliant no doubt, but not in that field.
[-]
- nl 39 days ago
  He's a pretty decent programmer.
  It's interesting that some months ago when his nanochat project came out the HN Anti-AI crowd celebrated him saying "I tried to use claude/codex agents a few times but they just didn't work well enough at all and net unhelpful, possibly the repo is too far off the data distribution"
  But now it is working for him he's suddenly not an expert...
  [1] https://news.ycombinator.com/item?id=45573521
  [-]
  - latexr 39 days ago
    What you’re calling the “crowd” was not the same people. Every time someone makes a claim like yours, I go and check and don’t see the same usernames in the conversation. “Different people have different opinions and different ways to express them” isn’t really an insight; it tells us nothing nor does it make anyone worthy of criticism.
    You can’t, in an honest argument, lump different strangers into a group you invented to accuse them of duplicity or hypocrisy.
  - hamstergene 37 days ago
    Having created 100 of nano-sized projects does not add up to having developed and maintained one large code base.
    Coding agents are eating up programming from the lowest end, starting from pressing button on the keyboard to type the code in: completion was literally their first application. I don't think it will go all the way to the top, though, the essential part of the profession will remain until true AGI.
    Metaphorically, think how integrated chips didn't replace electrical engineering, just changed which production tools and components engineers deal with and how.
    Obviously we all are adapting to changes, but if he or someone are panicking about being behind, that can only be because they've never been in too deep.
  - iLoveOncall 39 days ago
    > But now it is working for him he's suddenly not an expert...
    Or maybe he didn't lie then but is lying now?
    [-]
    - nl 38 days ago
      Calling him a liar seems fairly unnecessary? For one thing people's minds can change, or that can be talking in different contexts. Or - as in this case - new technology could have been deployed that changed the game.
- viccis 39 days ago
  Maybe that's true, but I will say that one of the reasons I recommend his Python ML videos to people is not just the ML content but also his Python is good and idiomatic. So I would not agree; I think his programming is a well practiced skill.
  FWIW though I think his predicted worldview will render it very difficult to acquire this skill, as people grow reliant on gen AI for programming rudiments.
  [-]
  - 59nadir 39 days ago
    As far as "programming skill" goes, writing "good and idiomatic" Python is pretty bottom of the barrel. I don't think the GP is all that off, most people who are famous for some programming-adjacent skill (or even programming) aren't good at programming.
    [-]
    - viccis 36 days ago
      >As far as "programming skill" goes, writing "good and idiomatic" Python is pretty bottom of the barrel.
      Complete bullshit. Beginning programmers writing good and idiomatic Python isn't "bottom of the barrel", or did you think I was recommending his videos to 20 year seasoned pros to improve their coding?
      Some people on this site need to check their arrogance and humble themselves a bit before opening their mouths.
- ex-aws-dude 39 days ago
  Exactly, I would put more weight on this if it were coming from someone who actually works as a regular programmer in the industry
- ActionHank 39 days ago
  This is such a great way to frame all his comments.
superze 41 days ago
As an Opus user, I genuinely don’t understand how someone can work for weeks or months without regularly opening an IDE. The output almost always fails.
I repeatedly rewrite prompts, restate the same constraints, and write detailed acceptance criteria, yet still end up with broken or non-functional code.its very frustrating to say the least Yesterday alone I spent about $200 on generations that now require significant manual rewrites just to make them work.
At that point, the gains are questionable. My biggest success is having the model take over the first Design in my app and I take it from there, but those hundred lines if not thousand lines of code it generates are so Messi, it's insanely painful to refactor the mess afterwards
[-]
- throwatdem12311 39 days ago
  I have a hell of a time just getting any LLM to write SQL queries that have things like window functions, aggregates and lateral left joins - even when shoving the entire database schema DDL into the context.
  It's so frustrating, it regularly makes me want to just quit the profession. Which is why I still just write most code by hand.
  [-]
  - data-ottawa 39 days ago
    I write a lot of SQL and I haven't had these issues for months, even with smaller models. Opus can one shot most of my queries faster than I could type them.
    Instead of stuffing the context with DDL I suggest:
    1. Reorganize your data warehouse. It needs to be easy to find the correct data. Make sure you use ELT clear layers, meaningful schemas, and have per-model documentation. This is a ton of work, but if done right the payoff is massive.
    2. I built a tool for myself to pull our warehouse into a graph for fuzzy search+dependency chain analysis. In the spring I made an MCP server for it and Claude uses that tool incredibly well for almost all queries. I haven't actually used the GUI or scripts since I built the MCP.
    Claude and Devstral are the best models I've used for SQL. I cannot get Gemini to write decent modern sql -- even the Gemini data science/engineer agents in Google Cloud. I occasionally try the paid models through the API and still haven't been impressed.
    [-]
    - enraged_camel 39 days ago
      >> I write a lot of SQL and I haven't had these issues for months, even with smaller models. Opus can one shot most of my queries faster than I could type them.
      Same. SOTA models crush every SQL question I give them.
      [-]
      - ragequittah 35 days ago
        I think this might be a big part of the problem with the conversation about AI right now. The models have become so much better in the last ~6 months in my experience and lots of people wrote them off 1-2 years ago after they couldn't do x and 'we've hit a wall' was being thrown around everywhere.
      - koolba 39 days ago
        What is SOTA?
        [-]
        phito 39 days ago
        State of the art.
  - deadbabe 39 days ago
    If you really know SQL, writing an SQL query basically just feels like writing a prompt for a database client anyway, except it does exactly what you ask for.
    [-]
    - throwatdem12311 39 days ago
      I have a running joke at work.
      * LLMs are just matrix multiplication. * SQL is just algebra, which has matrix multiplication as part of it. * Therefore SQL is AI * Now who is ready to invest a billion dollars in our AI SaaS company?
      Or it’s just that astronaut with a gun meme: “Wait AI is just SQL?….Alway has been.”
- SkyPuncher 39 days ago
  My trick is to explicitly roll play that we’re doing a spike. This gets all of the models to ignore all of the details they normally get hung up on. Once I have the basics in place, I can tell it to fix details.
  It’s _always_ easier to add more code than it is to fix broken code.
- nowittyusername 39 days ago
  Most people have not fully grasped how LLM's work and how to properly utilize agentic coding solutions. That is the reason for issues when it comes to vibe coders having low quality code. But that is not the limitation of technology but the user (at this stage). Basically think of it this way everyone is the grandma that has been handed a palm pilot to use to get things done. Grandma needs an iPhone not a palm pilot but the problem is that we are not in that territory yet. So now consider the people who were able to use the palm pilot very successfully and well, they were few and they were the exception, but they existed. Same here. I have been using coding agent for over 7 months now and have written zero lines of code, in fact I don't know how to code at all. But i have been able to architect very complex software projects from scratch. Text to speech , automated llm benchmarking systems for testing all possible llama.cpp sampling parameters and more, and now im building my own agentic framework from scratch. All of these things are possible and more without writing one line of code yourself. But it does require understanding how to use the technology well to get this done.
  [-]
  - mirsadm 39 days ago
    If you don't know how to code then you are not able to judge what your producing accurately.
    [-]
    - nowittyusername 39 days ago
      here you go I open sourced one of the projects https://youtu.be/EyE5BrUut2o
  - krior 39 days ago
    All of the applications you mention could be scoped as beginner projects. I don't think they represent good proofs of capability.
    [-]
    - nowittyusername 39 days ago
      Well why don't you look at it for yourself and tell me if this looks like a beginner project https://youtu.be/EyE5BrUut2o
      [-]
      - sieep 39 days ago
        Yes, this does look like a beginner project & exactly what i expected from someone who doesn't write code.
      - Eridrus 38 days ago
        This is extremely simple software.
        Claude is extremely verbose when it generates code, but this is something that should take a practicing software engineer an hour or so to write with a lot less code than Claude.
        I like all the LLM coding tools, they're constantly getting better, but I remain convinced that all the people claiming massive productivity improvements are just not good software engineers.
        I think the tools are finally at the point where they are generally a help, rather than a net waste of time for good engineers, but it's still marginal atm.
- shepherdjerred 39 days ago
  I hardly ever open an IDE anymore.
  I use Claude Code and Cursor. What I do:
  - use statically typed languages: TypeScript, Go, Rust, Python w/ types
  - Setup linters. For TS I have a bunch of custom lint rules (authored by AI) for common feedback that I've given. (https://github.com/shepherdjerred/monorepo/tree/main/package...)
  - For Cursor, lots of feedback on my desired style. https://github.com/shepherdjerred/scout-for-lol/tree/main/.c...
  - Heavy usage of plan mode. Tell AI something like "make at least 20 searches to online documentation", support every claim with a reference, etc. Tell AI "make a task for every little thing you'll implement"
  - Have the AI write tests, particularly the more expensive ones like integration and end-to-end, so you have an easy way to verify functionality.
  - Setup Claude Code GHA to automatically review PRs. Give the review feedback to the agent that implemented it, either via copy-pasting or tell the agent "fetch review comments and fix them".
  Some examples of what I've made:
  - Many features for https://scout-for-lol.com/, a League of Legends bot for Discord
  - A program to generate TypeScript types for Helm charts (https://github.com/shepherdjerred/homelab/tree/main/src/helm...)
  - A program to summarize all of the dependency updates for my Homelab (https://github.com/shepherdjerred/homelab/tree/main/src/deps...)
  - A program to manage multiple instances of CLI agents like Claude Code (https://github.com/shepherdjerred/monorepo/tree/main/package...)
  - A Discord AI bot in the style of my friends (https://github.com/shepherdjerred/monorepo/tree/main/package...)
  [-]
  - moffkalast 39 days ago
    > make at least 20 searches to online documentation
    Lol sometimes I have to spend two turns convincing Claude to use its goddamn search and look up the damn doc instead of trying to shoot from the hip for the fifth time. ChatGPT at least has forced search mode.
    [-]
    - shepherdjerred 39 days ago
      I've found that telling it to specifically do N searches works consistently. I do really wish Claude Code had a "deep research" mode similar to 'normal' Claude.
  - throw2312321 39 days ago
    Thanks for sharing. So the dumb question - do you feel like Claude Code & Cursor have made you significantly more productive? You have an impressive list of personal projects, and I can see how a power user of AI tools can be very effective with green field projects. Does the productivity boost translate as well to your day job?
    [-]
    - shepherdjerred 39 days ago
      For personal projects, I have found it to be transformative. I've always struggled with perfection and doing the "boring parts". AI has allowed me to add lots of little nice-to-have features and focus less on the code.
      I'm lucky enough that my workplace also uses Cursor + Claude Code, so my experience directly transfers. I most often use Cursor for day-to-day work. Claude has been great as a research assistant when analyzing how data flows between multiple repos. As an example I'm writing a design doc for a new feature and Claude has been helping me with the investigation. My workflow is more or less to say: "here are my repos, here is the DB schema, here are previous design docs, now how does system X work, what would happen if I did Y, etc."
      AI is still fallible so you _do_ of course have to do lots of checking and validation which can be boring, but much easier if you add a prompt like "support every claim you make with a concrete reference".
      When it comes to implementation, I generally give it smaller, more concrete pieces to work with. e.g. for a personal project I would say something like "here is everything I want to do, make a plan, do part 1, then do part 2, example: https://github.com/shepherdjerred/scout-for-lol/tree/227e784...)
      At work, I tend to give it PR-sized units of work. e.g. something very well-scoped and defined. My workflow is: prompt, make a PR on GitHub, add comments on GitHub, tell Cursor "I left comments on your PR, address them", repeat. Essentially I treat AI as a coworker submitting code to me.
      I don't really know that I can quantify the productive gain.. I can say that I am _much_ more motivated in the last few months because AI removes so much friction. I think it's backed up by my commit history since June/July which is when I started using Cursor heavily: https://github.com/shepherdjerred
  - BhavdeepSethi 39 days ago
    Cursor is an IDE.
    [-]
    - shepherdjerred 39 days ago
      Oh to clarify I used to use Cursor but the last month or two I've used Claude Code almost exclusively. Mostly because it seems to be more generous with credits.
- miguel_martin 39 days ago
  This is what an AGENTS.md - https://agents.md/ (or CLAUDE.md) file is for. Put common constraints to correct model mistakes/issues with respect to the codebase, e.g. in a “code style” section.
- tmaly 39 days ago
  What does your software creation workflow look like? Do you have a design phase?
- falcor84 40 days ago
  Why would you spend $200 a day on Opus if you can pay that for a month via the highest tier Claude Max subscription? Are you using the API in some special way?
  [-]
  - jefffoster 39 days ago
    At a guess an Enterprise API account. Pay per token but no limits.
    It’s very easy to spend $100s per dev per day.
    [-]
    - simonw 39 days ago
      The $200/month plan doesn't have limits either - they have an overage fee you can pay now in Claude Code so once you've expended your rate limited token allowance you can keep on working and pay for the extra tokens out of an additional cash reserve you've set up.
      [-]
      - merlincorey 39 days ago
        > The $200/month plan doesn't have limits either... once you've expended your rate limited token allowance... pay for the extra tokens out of an additional cash reserve you've set up
        You're absolutely right! Limited token allowance for $200/month is actually unlimited tokens when paying for extra from a cash reserve which is also unlimited, of course.
        [-]
        simonw 39 days ago
        I think you may have misunderstood something here.
        When paying for Claude Max even at $200/month there are limits - you have a limit to the number of tokens you can use per five hour period, and if you run out of that you may have to wait an hour for the reset.
        You COULD instead use an API key and avoid that limit and reset, but that would end up costing you significantly more since the $200/month plan represents such a big discount on API costs.
        As-of a few weeks ago there's a third option: pay for the $200/month plan but allow it to charge you extra for tokens when you reach those limits. That gives you the discount but means your work isn't interrupted.
        Extra Usage for Paid Claude Plans: https://support.claude.com/en/articles/12429409-extra-usage-...
        [-]
        merlincorey 39 days ago
        Thank you for the explanation, but I did fully understand that is what you were saying.
        What I don't fully understand is how you can characterize that as "not limited" with a straight face; then again, I can't see your face so maybe you weren't straight faced as you wrote it in the first place.
        Hopefully you could see my well meaning smile with the "absolutely right" opening, but apparently that's no longer common so I can understand your confusion as https://absolutelyright.lol/ indicates Opus 4.5 has had it RLHF'd away.
        [-]
        simonw 39 days ago
        When I said "not limited" I meant "no longer limits your usage with a hard stop when you run out of tokens for a five hour period any more like it did until a few weeks ago".
        That's why I said "not limited" as opposed to "unlimited" - a subtle difference in word choice, I'll give you that.
    - falcor84 39 days ago
      Oh, I wasn't arguing that it isn't "easy to spend $100s per dev per day". I was just asking what the use-case for that is.
- christophilus 41 days ago
  I’ve had decent results from it. What programming language are you using?
- cloudflare728 39 days ago
  Sometimes I have a similar file or related files. I copy their names and say use them as reference. Code quality improves by 10 times if you do so. Even providing a a example from framework's getting started works great too for new project.
  Yeah the pain of cleaning up small mess is great too. I had some tests failing and type failing issues, I thought I will fix it later by only using AI prompt. As the size was growing, failing Typescript issues was growing too. At some point it was 5000+ type issues and countless number of failing unit tests. Then more and more. I tried to fix with AI, since it was not possible fixing old way. Then I discarded the whole project when it was around 500k lines of code.
  [-]
  - pca006132 39 days ago
    Question: How many LoC do you let the AI write for each iteration? And do you review that? It sounds like you are letting it run off leash.
    [-]
    - cloudflare728 39 days ago
      I had no idea how it would end up. It was first time using AI IDE. I had only used chatgpt.com and claude.ai for small changes before. I continued it for the experiment. I thought AI write too many tests, I will judge based on test passing. I agree, it was bad expectation + no experience with AI IDE + bad software engineering.
Aldipower 39 days ago
I am a software developer and mainly a programmer for decades now. I love programming. I love to be "once" with the computer. I will never give this joy up. If I need to sell shoes at daytime, I will program real computer programs in the evenings. If it won't be possible with modern machinery anymore, I will take my Commodore 64. I am a free man.
Edit: Corrected since/for. :-)
[-]
- NooneAtAll3 39 days ago
  (for decades)
  ('since' takes time_point - 'for' takes time_duration)
- computersuck 39 days ago
  you mean "one" not "once" right?
  [-]
  - PessimalDecimal 39 days ago
    Just once. Just for a night.
  - baobun 37 days ago
    If you have never experienced becoming once, try DMT or a large amount of acid on a quiet day.
noosphr 39 days ago
The only time I've felt this much behind was in high school when everyone was talking about how much sex they were having.
AI code is the Canadian girlfriend of programming.
[-]
- kqr 39 days ago
  Not being from the US, it took me a moment to realise that "she's in Canada" is a nice excuse for why nobody has met her.
- symbogra 39 days ago
  Time to shell out the $200 my friend.
  [-]
  - noosphr 39 days ago
    You need a lot more money than $200 if your code base is more than 100,000 lines.
    If only more people understood what quadratic attention means in the real world.
    [-]
    - karmakurtisaani 39 days ago
      Are you saying the LLM costs grow quadratically in the size of the code base? The prompts are already highly subsidized, can't wait to see what happens when they charge the actual price to the consumer.
      [-]
      - symbogra 39 days ago
        For now I'm gonna enjoy my VC subsidized burrito deliveries while I can.
        [-]
        ogkarlin 37 days ago
        The real question to ask is are you receiving the delivery or are you the courier training your replacement :)
- YouWhy 39 days ago
  Touché! That's a good one.
reconnecting 39 days ago
> OpenAI's sales and marketing expenses increased to _$2 billion_ in the first half of 2025.
Looks like AI companies spend enough on marketing budgets to create the illusion that AI makes development better.
Let's wait one more year, and perhaps everyone who didn't fall victim to these "slimming pills” for developers' brains will be glad about the choice they made.
[-]
- 9dev 39 days ago
  Well. I was a sceptic for a long time, but a friend recently convinced me to try Claude Code and showed me around. I revived an open source project I regularly get back to, code for a bit, have to wrestle with toil and dependency updates, and loose the joy before I really get a lot done, so I stop again.
  With Claude, all it took to fix all of that drudge was a single sentence. In the last two weeks, I implemented several big features, fixed long standing issues and did migrations to new major versions of library dependencies that I wouldn’t have tackled at all on my own—I do this for fun after all, and updating Zod isn’t fun. Claude just does it for me, while I focus on high-level feature descriptions.
  I’m still validating and tweaking my workflow, but if I can keep up that pace and transfer it to other projects, I just got several times more effective.
  [-]
  - reconnecting 39 days ago
    This sounds to me like a lack of resource management, as tasks that junior developers might perform don't match your skills, and are thus boring.
    As a creator of an open-source platform myself, I find trusting a semi-random word generator in front of users unreliable.
    Moreover, I believe it creates a bad habit. I've seen developers forget how to read documentation and instead trust AI, and of course, as a result AI makes mistakes that are hard to debug or provokes security issues that are easy to overlook.
    I know this sounds like a luddite talking, but I'm still not convinced that AI in its current state can be reliable in any way. However, because of engineers like you, AI is learning to make better choices, and that might change in the future.
    [-]
    - gejose 39 days ago
      > a semi-random word generator
      Calling tools like Claude Code a "semi-random word generator" is certainly a choice, and I suspect it won't age well.
    - pca006132 39 days ago
      > as tasks that junior developers might perform don't match your skills, and are thus boring.
      Yeah this sounds interesting, and matches my experience a bit. I was trying out AI for the Christmas cuz people I know are talking about it. I asked it to implement something (refactoring for better performance) that I think should be simple, it did that and looks amazing, all tests passed too! When I look into the implementation, AI got the shape right, but the internals were more complicated than needed and were wrong. Nonetheless it got me started into fixing things, and it got fixed quite quickly.
      The performance of the model in this case is not great, perhaps it is also because I am new to this and don't know how to prompt it properly. But at least it is interesting.
      [-]
      - eichin 38 days ago
        This sounds a lot like the classic "the way to get a good answer on the internet is to post a wrong answer first", but in reverse - the AI gives you a bad version which trolls you into digging in and giving the right answer :-)
    - unethical_ban 37 days ago
      <musing aloud mode>
      I think AI coding should not be permitted in the first two years of training in CS. One should have to learn the basics of reading quality documentation, creating quality code and documentation, learning how the different pieces of software work together, and learning how to work with others.
      LLMs are great for people with some idea of what they're doing, and need "someone else" to pair program with. I agree it will cripple the architectural thinking of new learners if they never learn how to think about code on their own.
    - 9dev 39 days ago
      That’s a totally fair take IMHO, and I’m very much conflicted on several ends on this topic—for example, would I want my juniors to use an agent? No; not even the mid levels, probably. As you say, it’s easy to form bad habits, and you need a good intuition for architecture and complexity, otherwise you end up with broken, unmaintainable messes. but if you have that, it’s like magic.
- CamperBob2 39 days ago
  Let's wait one more year, and perhaps everyone who didn't fall victim to these "slimming pills” for developers' brains will be glad about the choice they made.
  In that year, AI will get better. Will you?
  [-]
  - reconnecting 39 days ago
    AI is only getting better at consuming energy and wasting people's time communicating with this T9. However, if talented engineers continue to use it, it might eventually provide more accurate replies as a result.
    Answering your question, no matter how much I personally degrade or improve, I will not be able to produce anything even remotely comparable in terms of negative impact that AI brings to humanity these days.
    [-]
    - kakapo5672 39 days ago
      I see this logical pairing a lot.
      1) AI is basically useless, a mere semi-random word generator. 2) And it is so powerful that it is going to hurt (or even destroy) humanity.
      This is this is called "having your cake, and letting it eat you too".
      [-]
      - latexr 39 days ago
        There’s nothing incongruent about that pairing (though I also think you’re not being entirely fair in describing what your parent comment said). Atom bombs also fit: They are basically useless and they are so powerful that they can destroy humanity.
        With LLMs, the destruction is less immediate and overt, but chatbots do provable harm to people, and can be manipulated to warp our sense of reality.
        https://en.wikipedia.org/wiki/Chatbot_psychosis
        People are having romantic relationships with their chatbots and committing suicide because of them. That is harm.
        [-]
        CamperBob2 38 days ago
        Atom bombs also fit: They are basically useless
        Let's ask your friendly local Ukrainian refugee about that.
        People are having romantic relationships with their chatbots and committing suicide because of them. That is harm.
        So the only permissible technologies are those suitable for use by children and the mentally disturbed. I see.
        [-]
        latexr 38 days ago
        > Let's ask your friendly local Ukrainian refugee about that.
        You understand “basically useless” does not mean “entirely useless”, right? That’s why the word “basically” is there.
        I know Ukrainian people. I know Ukrainian people who are in attacked cities right now. They are friendly, and all of them would understand my point.
        > So the only permissible technologies are those suitable for use by children and the mentally disturbed. I see.
        That is a bad faith argument. HN rules ask you to not do that and steel man. It is obvious that is not what I said, “permissible” isn’t part of the argument at all. And if you think one needs to be “mentally disturbed” to be affected, you are high on arrogance and low on empathy and information. There are numerous stories of sane people becoming affected.
        https://archive.ph/2025.09.24-025805/https://www.nytimes.com...
        [-]
        CamperBob2 38 days ago
        Wait'll you hear about Dungeons & Dragons! As if backwards masking in rock and roll music weren't enough.
        You're right, I don't have much empathy for bullshit pop-psych as an instrument of motivated reasoning. If ChatGPT can convince you to kill yourself, you weren't mentally healthy to begin with, and something else would have eventually had the same effect on you. Either that, or you were an unsupervised child, victimized not by a chatbot but by your parents. A tragedy either way, but good faith requires us to place the blame where it's actually due.
        [-]
        latexr 38 days ago
        > Wait'll you hear about Dungeons & Dragons! As if backwards masking in rock and roll music weren't enough.
        All ask you again to not engage in bad faith.
        > If ChatGPT can convince you to kill yourself, you weren't mentally healthy to begin with, and something else would have eventually had the same effect on you.
        That is false.
        https://en.wikipedia.org/wiki/Suicide_barrier#Efficacy
        > Research has shown suicidal thinking is often short-lived. Those who attempted suicide from the Golden Gate Bridge and were stopped in the process by a person did not go on to die by suicide by some other means. There are also a variety of examples that show restricting means of suicide have been associated with the overall reduction of it.
        https://eric.ed.gov/?id=EJ195697
        https://pmc.ncbi.nlm.nih.gov/articles/PMC478945/
        [-]
        CamperBob2 38 days ago
        So now we've moved on to the topic of nets on bridges. Okey-dokey, then.
        You started by comparing ChatGPT to thermonuclear weapons, inferring that it's a useless thing yet also an existential threat to humanity. State your position and desired outcome. You're all over the place here.
      - ewoodrich 39 days ago
        That's a dishonest framing of their argument. There's nothing logically inconsistent in believing wide adoption of AI tools causes developers' skills to atrophy and that the tools also fail to deliver on the hype/promises.
        You're inserting "destroy humanity" when OP is suggesting the problem is offloading all thinking to an unreliable tool (I don't entirely agree with their position but it's defensible and not as you stated).
        [-]
      - CamperBob2 39 days ago
        There's no point arguing with someone who's not only wrong, but who doesn't care if they're wrong. ("I will not be able to produce anything even remotely comparable in terms of negative impact that AI brings to humanity these days.")
        There are basically no conditions under which one party can or will reach a legitimate common ground with the other. Sucks, but that's HN nowadays.
        [-]
        reconnecting 39 days ago
        There is common ground, as per my initial message. Only one AI company spends billions of dollars yearly on marketing their software to make it work. I work on open-source software development on a bootstrapped basis.
        My input is: water, nutrition, a bit of electricity, and beliefs and the output is a fairly complex logical system like software. AI's input is billions of dollars, hundreds of thousands of people's lives spent in screen time daily, gigawatts of electricity, and still produces very questionable results.
        To answer your question in other words: if you spent the same amount of resources on human intelligence, it might bring much more impressive results in one year. However, taking into account the resources already paid into these AI technologies, humanity is unlikely to have a chance to buy out of this new 'dependency'.
        [-]
        CamperBob2 39 days ago
        To answer your question in other words: if you spent the same amount of resources on human intelligence
        If AI tools don't amplify and magnify your own intelligence, it's not their fault.
        If the advances turn out to be illusory, on the other hand, they'll be unwound soon enough. We generally don't stick with expensive technology that doesn't work. At the same time, fortunately, we also don't generally wait for your approval before trying new things.
        [-]
        LtWorf 36 days ago
        > We generally don't stick with expensive technology that doesn't work.
        Homeopathy is still around…
        [-]
        CamperBob2 36 days ago
        D'oh, you got me there.
- nl 39 days ago
  > OpenAI's sales and marketing expenses increased to _$2 billion_ in the first half of 2025.
  I believe they include the costs of free ChatGPT user's in that $2B. Worth it considering the conversion rate they are getting (5-6% in Oct 2024[1]).
  [1] https://www.cnet.com/tech/services-and-software/openai-cfo-p...
rishabhaiover 42 days ago
For the longest time, the joy of creation in programming came from solving hard problems. The pursuit of a challenge meant something. Now, that pursuit seems to be short-circuited by an animated being racing ahead under a different set of incentives. I see a tsunami at the beach, and I’m not sure whether I can run fast enough.
[-]
- condensedcrab 42 days ago
  Not to mention many companies speedrunning systems of strange and/or perverse incentives with AI adoption.
  That being said, Welch’s grape juice hasn’t put Napa valley out of business. Human taste is still the subjective filter that LLMs can only imitate, not replace.
  I view LLM assisted coding (on the sliding scale from vibe coding to fancy auto complete) similar to how Ableton and other DAW software have empowered good musicians that might not have made it otherwise due to lack of connections or money, but the music industry hasn’t collapsed completely.
  [-]
  - tjr 42 days ago
    In the music world, I would say that, rather than DAWs, LLM-assisted coding is more like LLM-assisted music creation.
    [-]
    - design2203 42 days ago
      Yep DAW’s aren’t the comparison. People are not thinking deeply about what is going on - there is a big war on-going in order to eradicate taste and make it systematic to immensely benefit the few.
- skybrian 39 days ago
  I see it more like a playing a text adventure game. You give it commands and sometimes it works, and sometimes the results are unexpected.
  [-]
  - zephen 39 days ago
    Personally, I've never been interested in being a character in someone else's story.
    But now you've got me thinking. Has anyone studied whether the programmers who are more enamored of AI are also into RPGs?
- m463 42 days ago
  > I can run fast enough.
  Can you do some code reviews while you're running?
- nextworddev 42 days ago
  (Inception scene) here a minute is seven hours
rambojohnson 39 days ago
What exhausts me isn’t “falling behind.” It’s watching the profession collectively decide that the solution to uncertainty is to pile abstraction on top of abstraction until no one can explain what’s actually happening anymore.
This agentic arms race by C-suite know-nothings feels less like leverage and more like denial. We took a stochastic text generator, noticed it lies confidently, wipes entire databases and harddrives, and responded by wrapping it in managers, sub-agents, memories, tools, permissions, workflows, and orchestration layers so we don’t have to look directly at the fact that it still doesn’t understand anything.
Now we’re expected to maintain a mental model not just of our system, but of a swarm of half-reliable interns talking to each other in a language that isn’t executable, reproducible, or stable.
Work now feels duller than dishwater, enough to have forced me to career pivot for 2026.
[-]
- simonw 39 days ago
  I think AI-assisted programming may be having the opposite effect, at least for me.
  I'm now incentivized to use less abstractions.
  Why do we code with React? It's because synchronizing state between a UI and a data model is difficult and it's easy to make mistakes, so it's worth paying the React complexity/page-weight tax in order for a "better developer experience" that allows us to build working, reliable software with less typing of code into a text editor.
  If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
  How often have you dropped in a big complex library like Moment.js just because you needed to convert a time from one format to another, and it would take too long to hand-write that one feature (and add tests for it to make sure it's robust)? With an LLM that's a single prompt and a couple of minutes of wait.
  Using LLMs to build black box abstraction layers is a choice. We can choose to have them build LESS abstraction layers for us instead.
  [-]
  - roadside_picnic 39 days ago
    > If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
    I've had plenty of junior devs justify massive code bases of random scripts and 100+ line functions with the same logic. There's a reason senior devs almost always push back on this when it's encountered.
    Everything hinges on that "if". But you're baking a tautology into your reasoning: "if LLMs can do everything we need them to, we can use LLMs for everything we need".
    The reason we stop junior devs from going down this path is because experience teaches us that things will break and when they do, it will incur a world of pain.
    So "LLM as abstraction" might be a possible future, but it assumes LLMs are significantly more capable than a junior dev at managing a growing mess of complex code.
    This is clearly not the case with simplistic LLM usage today. "Ah! But you need agents and memory and context management, etc!" But all of these are abstractions. This is what I believe the parent comment is really pointing out.
    If AI could do what we originally hoped it could: follow simple instructions to solve complex tasks. We'd be great, and I would agree with your argument. But we are very clearly not in that world. Especially since Karpathy can't even keep up with the sophisticated machinery necessary to properly orchestrate these tools. All of the people decrying "you're not doing it right!" are emphatically proving that LLMs cannot perform these tasks at the level we need them to.
    [-]
    - simonw 39 days ago
      I'm not arguing for using LLMs as an abstraction.
      I'm saying that a key component of the dependency calculation has changed.
      It used to be that one of the most influential facts affecting your decision to add a new library was the cost of writing the subset of code that you needed yourself. If writing that code and the accompanying tests represented more than an hour of work, a library was usually a better investment.
      If the code and tests take a few minutes those calculations can look very different.
      Making these decisions effectively and responsibly is one of the key characteristics of a senior engineer, which is why it's so interesting that all of those years of intuition are being disrupted.
      The code we are producing remains the same. The difference is that a senior developer may have written that function + tests in several hours, at a cost of thousands of dollars. Now that same senior developer can produce exactly the same code at a time cost of less than $100.
      [-]
      - all_factz 39 days ago
        React is hundreds of thousands of lines of code (or millions - I haven’t looked in awhile). Sure, you can start by having the LLM create a simple way to sync state across components, but in a serious project you’re going to run into edge-cases that cause the complexity of your LLM-built library to keep growing. There may come a point at which the complexity grows to such a point that the LLM itself can’t maintain the library effectively. I think the same rough argument applies to MomentJS.
        [-]
        simonw 39 days ago
        If the complexity grows beyond what it makes sense to do without React I'll have the LLM rewrite it all in React!
        I did that with an HTML generation project to switch from Python strings to Jinja templates just the other day: https://github.com/simonw/claude-code-transcripts/pull/2
        [-]
        DrammBA 39 days ago
        Simon, you're starting to sound super disconnected from reality, this "I hit everything that looks like a nail with my LLM hammer" vibe is new.
        [-]
        simonw 39 days ago
        My habits have changed quite a bit with Opus 4.5 in the past month. I need to write about it..
        [-]
        godelski 39 days ago
        What's concerning to many of us is that you've (and others) have said this same thing s/Opus 4.5/some other model/
        That feels more like chasing than a clear line of improvement. It's interrupted very different from something like "my habits have changed quite a bit since reading The Art of Computer Programming". They're categorically different.
        [-]
        mkozlows 39 days ago
        It's because the models keep getting better! What you could do with GPT-4 was more impressive than what you could do with GPT 3.5. What you could do with Sonnet 3.5 was more impressive yet, and Sonnet 4, and Sonnet 4.5.
        Some of these improvements have been minor, some of them have been big enough to feel like step changes. Sonnet 3.7 + Claude Code (they came out at the same time) was a big step change; Opus 4.5 similarly feels like a big step change.
        (If you don't trust vibes, METR's task completion benchmark shows huge improvements, too.)
        If you're sincerely trying these models out with the intention of seeing if you can make them work for you, and doing all the things you should do in those cases, then even if you're getting negative results somehow, you need to keep trying, because there will come a point where the negative turns positive for you.
        If you're someone who's been using them productively for a while now, you need to keep changing how you use them, because what used to work is no longer optimal.
        [-]
        godelski 39 days ago
        Models keep getting better but the argument I'm critiquing stays the same.
        So does the comment I critiqued in the sibling comment to yours. I don't know why it's so hard to believe we just haven't tried. I have a Claude subscription. I'm an ML researcher myself. Trust me, I do try.
        But that last part also makes me keenly aware of their limitations and failures. Frankly I don't trust experts who aren't critiquing their field. Leave the selling points to the marketing team. The engineer and researcher's job is to be critical. To find problems. I mean how the hell do you solve problems if you're unable to identify them lol. Let the marketing team lead development direction instead? Sounds like a bad way to solve problems
        > benchmark shows huge improvements
        Benchmarks are often difficult to interpret. It is really problematic that they got incorporated into marketing. If you don't understand what a benchmark measures, and more importantly, what it doesn't measure, then I promise you that you're misunderstanding what those numbers mean.
        For METR I think they say a lot right here (emphasis my own) that reinforces my point
        > Current frontier AIs are vastly better than humans at text prediction and knowledge tasks. They outperform experts on most *exam-style problems* for a fraction of the cost. ... And yet the best AI agents are not currently able to carry out substantive projects by themselves or directly substitute for human labor. *They are unable to reliably handle even relatively low-skill*, computer-based work like remote executive assistance. It is clear that capabilities are increasing very rapidly in some sense, but it is unclear how this corresponds to real-world impact.
        So make sure you're really careful to understand what is being measured. What improvement actually means. To understand the bounds.
        It's great that they include longer tasks but also notice the biases and distribution in the human workers. This is important in properly evaluating.
        Also remember what exactly I quoted. For a long time we've all known that being good at leetcode doesn't make one a good engineer. But it's an easy thing to test and the test correlates with other skills that are likely to be learned to be good at those tests (despite being able to metric hack). We're talking about massive compression machines. That pattern match. Pattern matching tends to get much more difficult as task time increases but this is not a necessary condition.
        Treat every benchmark adversarialy. If you can't figure out how to metric hack it then you don't know what a benchmark is measuring (and just because you know what can hack it doesn't mean you understand it nor that that's what is being measured)
        [-]
        mkozlows 39 days ago
        I think you should ask yourself: If it were true that 1) these things do in fact work, 2) these things are in fact getting better... what would people be saying?
        The answer is: Exactly what we are saying. This is also why people keep suggesting that you need to try them out with a more open mind, or with different techniques: Because we know with absolute first-person iron-clad certainty what is possible, and if you don't think it's possible, you're missing something.
        nl 39 days ago
        I don't understand what your argument is.
        It seems to be "people keep saying the models are good"?
        That's true. They are.
        And the reason people keep saying it is because the frontier of what they do keeps getting pushed back.
        Actual, working, useful code completion in the GPT 4 days? Amazing! It could automatically write entire functions for me!
        The ability to write whole classes and utility programs in the Claude 3.5 days? Amazing! This is like having a junior programmer!
        And now, with Opus 4.5 or Codex Max or Gemini 3 Pro we can write substantial programs one-shot from a single prompt and they work. Amazing!
        But now we are beginning to see that programming in 6 months time might look very different to now because these AI system code very differently to us. That's exactly the point.
        So what is it you are arguing against?
        I think you said you didn't like that people are saying the same thing, but in this post it seems more complicated?
        [-]
        timr 39 days ago
        > And now, with Opus 4.5 or Codex Max or Gemini 3 Pro we can write substantial programs one-shot from a single prompt and they work. Amazing!
        People have been doing this parlor trick with various "substantial" programs [1] since GPT 3. And no, the models aren't better today, unless you're talking about being better at the same kinds of programs.
        [1] If I have to see one more half-baked demo of a running game or a flight sim...
        [-]
        simonw 39 days ago
        "And no, the models aren't better today"
        Can you expand on that? It doesn't match my experience at all.
        [-]
        timr 39 days ago
        It’s a vague statement that I obviously cannot defend in all interpretations, but what I mean is: the performance of models at making non-trivial applications end-to-end, today, is not practically better than it was a few years ago. They’re (probably) better at making toys or one-shotting simple stuff, and they can definitely (sometimes) crank out shitty code for bigger apps that “works”, but they’re just as terrible as ever if you actually understand what quality looks like and care to keep your code from descending into entropy.
        I think "substantial" is doing a lot of heavy lifting in the sentence I quoted. For example, I’m not going to argue that aspects of the process haven’t improved, or that Claude 4.5 isn't better than GPT 4 at coding, but I still can’t trust any of the things to work on any modestly complex codebase without close supervision, and that is what I understood the broad argument to be about. It's completely irrelevant to me if they slay the benchmarks or make killer one-shot N-body demos, and it's marginally relevant that they have better context windows or now hallucinate 10% less often (in that they're more useful as tools, which I don't dispute at all), but if you want to claim that they're suddenly super-capable robot engineers that I can throw at any "substantial" problem, you have to bring evidence, because that's a claim that defies my day-to-day experience. They're just constantly so full of shit, and that hasn't changed, at all.
        FWIW, this line of argument usually turns into a mott and bailey fallacy, where someone makes an outrageous claim (e.g. "models have recently gained the ability to operate independently as a senior engineer!"), and when challenged on the hyperbole, retreats to a more reasonable position ("Claude 4.5 is clearly better than GPT 3!"), but with the speculative caveat that "we don't know where things will be in N years". I'm not interested in that kind of speculation.
        [-]
        simonw 38 days ago
        Have you spent much time with Codex 5.1 or 5.2 in OpenAI Codex or a Claude Opus 4".5 in Claude code over the last ~6 weeks?
        I think they represent a meaningful step change in what models can build. For me they are the moment we went from building relatively trivial things unassisted to building quite large and complex system that take multiple hours, often still triggered by a single prompt.
        Some personal examples from the past few weeks.
        - A spec-compliant HTML5 parsing library by Codex 5.2: https://simonwillison.net/2025/Dec/15/porting-justhtml/
        - A CLI-based transcript export and publishing tool by Opus 4.5: https://simonwillison.net/2025/Dec/25/claude-code-transcript...
        - A full JavaScript interpreter in dependency/free Python (!) https://github.com/simonw/micro-javascript - and here's that transcript published using the above-mentioned tool: https://static.simonwillison.net/static/2025/claude-code-mic...
        - A WebAssembly runtime in Python which I haven't yet published
        The above projects all took multiple prompts, but were still mostly built by prompting Claude Code for web on my iPhone in between Christmas family things.
        I have a single-prompt one:
        - A Datasette plugin that integrates Cloudflare's CAPTCHA system: https://github.com/simonw/datasette-turnstile - transcript: https://gistpreview.github.io/?2d9190335938762f170b0c0eb6060...
        I'm not confident any of these projects would have worked with the coding agents and models we had had four months ago. There is no chance they would've worked with the January 2025 available models.
        [-]
        lordmauve 38 days ago
        Are you using Stop hooks to keep Claude running on a task until it completes, or is it doing that by itself?
        [-]
        simonw 38 days ago
        I'm not using those yet.
        I mainly eat it clear tasks like "keep going until all these tests pass", but I do keep an eye on it and occasionally tell it to keep going.
        timr 38 days ago
        I’ve used Sonnet 4.5 and Codex 5 and 5.1, but not in their native environment [1].
        Setting aside the fact that your examples are mostly “replicate this existing thing in language X” [2], again, I’m not saying that the models haven’t gotten better at crapping out code, or that they’re not useful tools. I use them every day. They're great tools, when someone actually intelligent is using them. I also freely concede that they're better tools than a year ago.
        The devil is (as always) in the details: how many prompts did it take? what exactly did you have to prompt for? how closely did you look at the code? how closely did you test the end result? Remember that I can, with some amount of prompting, generate perfectly acceptable code for a complex, real-world app, using only GPT 4. But even the newest models generate absolute bullshit on a fairly regular basis. So telling me that you did something complex with an unspecified amount of additional prompting is fine, but not particularly responsive to the original claim.
        [1] Copilot, with a liberal sprinkling of ChatGPT in the web UI. Please don’t engage in “you’re holding it wrong” or "you didn't use the right model" with me - I use enough frontier models on a regular basis to have a good sense of their common failings and happy paths. Also, I am trying to do something other than experiment with models, so if I have to switch environments every day, I’m not doing it. If I have to pay for multiple $200 memberships, I’m not doing it. If they require an exact setup to make them “work”, I am unlikely to do it. Finally, if your entire argument here hinges on a point release of a specific model in the last six weeks…yeah. Not gonna take that seriously, because it's the same exact argument, every six weeks. </caveats>
        [2] Nothing really wrong with this -- most programming is an iterative exercise of replicating pre-existing things with minor tweaks -- but we're pretty far into the bailey now, I think. The original argument was that you can one-shot a complex application. Now we're in "I can replicate a large pre-existing thing with repeated hand-holding". Fine, and completely within my own envelope for model performance, but not really the original claim.
        [-]
        simonw 38 days ago
        I know you said don't engage in "you're holding it wrong"... but have you tried these models running in a coding agent tool loop with automatic approvals turned on?
        Copilot style autocomplete or chatting with a model directly is an entirely different experience from letting the model spend half an hour writing code, running that code and iterating on the result uninterrupted.
        Here's an example where I sent a prompt at 2:38pm and it churned away for 7 minutes (executing 17 bash commands), then I gave it another prompt and it churned for half an hour and shipped 7 commits with 160 passing tests: https://static.simonwillison.net/static/2025/claude-code-mic...
        I completed most of that project on my phone.
        [-]
        timr 38 days ago
        > I know you said don't engage in "you're holding it wrong"... but have you tried these models running in a coding agent tool loop with automatic approvals turned on?
        edit: I wrote a different response here, then I realized we might be talking about different things.
        Are you asking if I let the agents use tools without my prior approval? I do that for a certain subset of tools (e.g. run tests, do requests, run queries, certain shell commands, even use the browser if possible), but I do not let the agents do branch merges, deploys, etc. I find that the best models are just barely good enough to produce a bad first draft of a multi-file feature (e.g. adding an entirely new controller+view to a web app), and I would never ever consider YOLOing their output to production unless I didn't care at all. I try to get to tests passing clean before even looking at the code.
        Also, I am happy to let Copilot burn tokens in this manner and will regularly do it for refactors or initial drafts of new features, I'm honestly not sure if the juice is worth the squeeze -- I still typically have to spend substantial time reworking whatever they create, and the revision time required scales with the amount of time they spend spinning. If I had to pay per token, I'd be much more circumspect about this approach.
        [-]
        simonw 38 days ago
        Yes, that's what I meant. I wasn't sure if you meant classic tab-based autocomplete or Copilot tool-based agent Copilot.
        Letting it burn tokens on running tests and refactors (but not letting it merge branches or deploy) is the thing that feels like a huge leap forward to me. We are talking about the same set of capabilities.
        [-]
        timr 37 days ago
        Ah, definitely agent-based copilot. I don't even have the autocomplete stuff turned on anymore, because I found it annoying.
        nl 38 days ago
        What do you class a "substantial program"?
        For me it is something I can describe in a single casual prompt.
        For example I wrote a fully working version of https://tools.nicklothian.com/llm_comparator.html in a single prompt. I refined it and added features with more prompts, but it worked from the start.
        [-]
        timr 38 days ago
        Good question. No strict line, and it's always going to be subjective and a little bit silly to categorize, but when I'm debating this argument I'm thinking: a product that does not exist today (obviously many parts of even a novel product will be completely derivative, and that's fine), with multiple views, controllers, and models, and a non-trivial amount of domain-specific business logic. Likely 50k+ lines of code, but obviously that's very hand-wavy and not how I'd differentiate.
        Think: SaaS application that solves some domain specific problem in corporate accounting, versus "in-browser speadsheet", or "first-person shooter video game with AI, multi-player support, editable levels, networking and high-resolution 3D graphics" vs "flappy bird clone".
        When you're working on a product of this size, you're probably solving problems like the ones cited by simonw multiple times a week, if not daily.
        [-]
        nl 38 days ago
        I don't think anyone is claiming they can one-shot a 50k line SAAS app.
        I think you'd get close on something like Lovable but that's not really one shot either.
        [-]
        nl 38 days ago
        But re-reading your statement you seem to be claiming that there are no 50k SAAS apps that are build even using multi-shot techniques (ie, building a feature at a time).
        In that case my Vibe-Prolog project would count: https://github.com/nlothian/Vibe-Prolog/
        - It's 45K of python code - It isn't a duplicate of another program (indeed, the reason it isn't finished is because it is stuck between ISO Prolog and SWI Prolog and I need to think about how to resolve this, but I don't know enough Prolog!) - Not a *single* line of code is hand written.
        Ironically this doesn't really prove that the current frontier models are better because large amounts of code were written with non-frontier models (You can sort of get an idea of what models were used with the labels on https://github.com/nlothian/Vibe-Prolog/pulls?q=is%3Apr+is%3...)
        But - importantly - this project is what convinced me that the frontier models are much better than the previous generation. There were numerous times I tried the same thing in a non-Frontier model which couldn't do it, and then I'd try it in Claude, Codex or Gemini and it would succeed.
        pianopatrick 39 days ago
        Is there an endpoint for AI improvement? If we can go from functions to classes to substantial programs then it seems like just a few more steps to rewriting whole software products and putting a lot of existing companies out of business.
        "AI, I don't like paying for my SAP license, make me a clone with just the features I need".
        godelski 38 days ago
        Two things seem to be in contention:
        - Models keep getting better[0] - Models since GPT 3 are able to replace junior developers
        It's true that both of these can be true at the same time but they are still in contention. We're not seeing agents ready to replace mid level engineersand quite frankly I've yet to see a model actually ready to replace juniors. Possibly low end interns but the major utility of interns is to trial run employment. Frankly it still seems like interns and juniors are advancing faster than these models in the type of skills that matter for companies (not to mention that institutional knowledge is quite valuable). But there's interns that started when GPT 3.5 came out that are seniors now.
        The problem is we've been promised that these employees would be replaced[1] any day now, yet that's not happening.
        People forget, it is harder to advance when you're already skilled. It's not hard to go from non-programmer to a junior level. Hard to go from junior to senior. And even harder to advance to staff. The difficulty level only increases. This is true for most skills and this is where there's a lot of naivity. We can be advancing faster while the actual capabilities begin to crawl forward rather than leap.
        [0] Implication is not just at coding test style questions but also in more general coding development.
        [1] Which has another problem in the pipeline. If you don't have junior devs and are unable to replace both mid and seniors by the time that a junior would advance to a senior then you have built a bubble. There's a lot of big bets being made that this will happen yet the evidence is not pointing that way.
        pertymcpert 39 days ago
        Opus 4.5 is categorically a much better model from benchmarks and personal experience than Opus 4.1 & Sonnet models. The reason you're seeing a lot of people wax about O4.5 is that it was a real step change in reliable performance. It crossed for me a critical threshold in being able to solve problems by approaching things in systematic ways.
        Why do you use the word "chasing" to describe this? I don't understand. Maybe you should try it and compare it to earlier models to see what people mean.
        [-]
        godelski 39 days ago
        > Why do you use the word "chasing" to describe this?
        I think you'll get the answer to this if you read my comment and your response to understand why you didn't address mine.
        Btw, I have tried it. It's annoying that people think the problem is not trying. It was getting old when GPT 3.5 came out. Let's update the argument...
        v64 39 days ago
        Looking forward to hearing about how you're using Opus 4.5, from my experience and what I've heard from others, it's been able to overcome many obstacles that previous iterations stumbled on
        remich 39 days ago
        Please do. I'm trying to help other devs in my company get more out of agentic coding, and I've noticed that not everyone is defaulting to Opus 4.5 or even Codex 5.2, and I'm not always able to give good examples to them for why they should. It would be great to have a blog post to point to…
        indigodaddy 39 days ago
        Can you expound on Opus 4.5 a little? Is it so good that it's basically a superpower now? How does it differ from your previous LLM usage?
        [-]
        pertymcpert 39 days ago
        To repeat my other comment:
        > Opus 4.5 is categorically a much better model from benchmarks and personal experience than Opus 4.1 & Sonnet models. The reason you're seeing a lot of people wax about O4.5 is that it was a real step change in reliable performance. It crossed for me a critical threshold in being able to solve problems by approaching things in systematic ways.
        dimitri-vs 39 days ago
        Reality is we went from LLMs as chatbots editing a couple files per request with decent results. To running multiple coding agents in parallel to implement major features based on a spec document and some clarifying questions - in a year.
        Even IF llms don't get any better there is a mountain of lemons left to squeeze in their current state.
        zdragnar 39 days ago
        That would go over on any decently sized team like a lead balloon.
        [-]
        simonw 39 days ago
        As it should, normally, because "we'll rewrite it in React later" used to represent weeks if not months of massively disruptive work. I've seen migration projects like that push on for more than a year!
        The new normal isn't like that. Rewrite an existing cleanly implemented Vanilla JavaScript project (with tests) in React the kind of rote task you can throw at a coding agent like Claude Code and come back the next morning and expect most (and occasionally all) of the work to be done.
        [-]
        reactordev 39 days ago
        I’m going to add my perspective here as they seem to all be ganging up on you Simon.
        He is right. The game has changed. We can now refactor using an agent and have it done by morning. The cost of architectural mistakes is minimal and if it gets out of hand, you refactor and take a nap anyway.
        What’s interesting is now it’s about intent. The prompts and specs you write, the documents you keep that outline your intended solution, and you let the agent go. You do research. Agent does code. I’ve seen this at scale.
        zdragnar 39 days ago
        And everyone else's work has to be completely put on hold or thrown away because you did the whole thing all at once on your own.
        That's definitely not something that goes over well on anything other than an incredibly trivial project.
        [-]
        pertymcpert 39 days ago
        Why did you jump to the assumption that this:
        > The new normal isn't like that. Rewrite an existing cleanly implemented Vanilla JavaScript project (with tests) in React the kind of rote task you can throw at a coding agent like Claude Code and come back the next morning and expect most (and occasionally all) of the work to be done.
        ... meant that person would do it in a clandestine fashion rather than this be an agreed upon task prior? Is this how you operate?
        [-]
        zdragnar 39 days ago
        My very first sentence:
        > And everyone else's work has to be completely put on hold
        On a big enough team, getting everyone to a stopping point where they can wait for you to do your big bang refactor to the entire code base- even if it is only a day later- is still really disruptive.
        The last time I went through something like this, we did it really carefully, migrating a page at a time from a multi page application to a SPA. Even that required ensuring that whichever page transitioned didn't have other people working on it, let alone the whole code base.
        Again, I simply don't buy that you're going to be able to AI your way through such a radical transition on anything other than a trivial application with a small or tiny team.
        nl 39 days ago
        > meant that person would do it in a clandestine fashion rather than this be an agreed upon task prior? Is this how you operate?
        This doesn't mean this at all
        In an AI heavy project it's not unusual to have many speculative refactors kicked off and then you come back to see what it is like.
        Wonder you can do a Rust SIMD optimized version of that Numpy code you have? Try it! You don't even need to waste review time on it because you have heavy test coverage and can see if it is worth looking at.
        zeroonetwothree 39 days ago
        If you have 100s of devs working on the project it’s not possible to do a full rewrite in one go. So its to about clandestine but rather that there’s just no way to get it done regardless of how much AI superpowers you bring to bear.
        Teever 39 days ago
        Let's say I'm mildly convinced by your argument. I've read your blog post that was popular on HN a week or so ago and I've made similar little toy programs with AI that scratch a particular niche.
        Do you care to make any concrete predictions on when most developers will embrace this new normal as part of their day to day routine? One year? Five?
        And how much of this is just another iteration in the wheel of recarnation[0]? Maybe we're looking at a future where we see return to the monoculture library dense supply chain that we use today but the libraries are made by swarms of AI agents instead and the programmer/user is responsible for guiding other AI agents to create business logic?
        [0] https://www.computerhope.com/jargon/w/wor.htm
        [-]
        simonw 39 days ago
        It's really hard to predict how other developers are going to work, especially given how resistant a lot of developers are to fully exploring the new tools.
        I do think there's been a bit of a shift in the last two months, with GPT 5.1 and 5.2 Codex and Opus 4.5.
        We have models that can reliably follow complex instructions over multiple hour projects now - that's completely new. Those of us at the cutting edge are still coming to terms with the consequences of this (as illustrated by this Karpathy tweet).
        I don't trust my predictions myself, but I think the next few months are going to see some big changes in terms of what mainstream developers understand these tools as being capable of.
        mkozlows 39 days ago
        "The future is already here, it's just unevenly distributed."
        At some companies, most developers already are using it in their day to day. IME, the more senior the developer is, the more likely they are to be heavily using LLMs to write all/most of their code these days. Talking to friends and former coworkers at startups and Big Tech (and my own coworkers, and of course my own experience), this isn't a "someday" thing.
        People who work at more conservative companies, the kind that don't already have enterprise Cursor/Anthropic/OpenAI agreements, and are maybe still cautiously evaluating Copilot... maybe not so much.
        chairmansteve 39 days ago
        "React is hundreds of thousands of lines of code".
        Most of which are irrelevant to my project. It's easier to maintain a few hundred lines of self written code than to carry the react-kitchen-sink around for all eternity.
        wanderlust123 39 days ago
        Not all UIs converge to a React like requirement. For a lot of use cases React is over-engineering but the profession just lacks the balls to use something simpler, like htmx for example.
        [-]
        all_factz 39 days ago
        Sure, and for those cases I’d rather tell the agent to use htmx instead of something hand-rolled.
        zeroonetwothree 39 days ago
        Core react is fairly simple, I would have no problem using it for almost everything. The overengineering usually comes at a layer on top.
      - squigz 39 days ago
        > Making these decisions effectively and responsibly is one of the key characteristics of a senior engineer, which is why it's so interesting that all of those years of intuition are being disrupted.
        They're not being disrupted. This is exactly why some people don't trust LLMs to re-invent wheels. It doesn't matter if it can one-shot some code and tests - what matters is that some problems require experience to know what exactly is needed to solve that problem. Libraries enable this experience and knowledge to centralize.
        When considering whether inventing something in-house is a good idea vs using a library, "up front dev cost" factors relatively little to me.
        [-]
        joquarky 39 days ago
        Don't forget to include supply chain attacks in your risk assessment.
      - qazxcvbnmlp 39 days ago
        Without commenting if parent is right or wrong. (I suspect it is correct)
        If its true, the market will soon reward it. Being able to competently write good code cheaper will be rewarded. People don't employ programmers because they care about them, they are employed to produce output. If someone can use llms to produce more output for less $$ they will quickly make the people that don't understand the technology less competitive in the workplace.
        [-]
        zx8080 39 days ago
        > more output for less $$
        That's a trap: it's not obvious for those without experience in both business and engineering on how to estimate or later calculate this $$. The trap is in the cost of changes and fix budget when things will break. And things will break. Often. Also, the requirements will change often, that's normal (our world is not static). So the cost has some tendency to change (guess which direction). The thoughtless copy-paste and rewrite-everything approach is nice, but the cost goes up steep with time soon. Those who don't know it will be trapped dead and lose business.
        [-]
        tbrownaw 39 days ago
        Predicting costs may be tricky, but measuring them after the fact it's a fair bit easier.
        [-]
        zx8080 39 days ago
        Without prediction is like landing B787 totally blind without any instrumental or visual.
        It will not just hurt, it will kill a business.
      - brians 39 days ago
        A major difference is when we have to read and understand it because of a bug. Perhaps the LLM can help us find it! But abstraction provides a mental scaffold
        [-]
        godelski 39 days ago
        I feel like "abstraction" is overloaded in many conversations.
        Personally I love abstraction when it means "generalize these routines to a simple and elegant version". Even if it's harder to understand than a single instance it is worth the investment and gives far better understanding of the code and what it's doing.
        But there's also abstraction meaning to make less understandable or more complex and I think LLMs operate this way. It takes a long time to understand code. Not because any single line of code is harder to understand but because they need to be understood in context.
        I think part of this is in people misunderstanding elegance. It doesn't mean aesthetically pleasing, but to do something in a simple and efficient way. Yes, write it rough the first round but we should also strive for elegance. It more seems like we are just trying to get the first rough draft and move onto the next thing.
    - cameronh90 39 days ago
      Rather, the problem more often I see with junior devs is pulling in a dozen dependencies when writing a single function would have done the job.
      Indeed, part of becoming a senior developer is learning why you should avoid left-pad but accept date-fns.
      We’re still in the early stages of operationalising LLMs. This is like mobile apps in 2010 or SPA web dev in 2014. People are throwing a lot of stuff at the wall and there’s going be a ton of churn and chaos before we figure out how to use it and it settles down a bit. I used to joke that I didn’t like taking vacations because the entire front end stack will have been chucked out and replaced with something new by the time I get back, but it’s pretty stable now.
      Also I find it odd you’d characterise the current LLM progress as somehow being below where we hoped it would be. A few years back, people would have said you were absolutely nuts if you’d have predicted how good these models would become. Very few people (apart from those trying to sell you something) were exclaiming we’d be imminently entering a world where you enter an idea and out comes a complex solution without any further guidance or refining. When the AI can do that, we can just tell it to improve itself in a loop and AGI is just some GPU cycles away. Most people still expect - and hope - that’s a little way off yet.
      That doesn’t mean the relative cost of abstracting and inlining hasn’t changed dramatically or that these tools aren’t incredibly useful when you figure out how to hold them.
      Or you could just do what most people always do and wait for the trailblazers to either get burnt or figure out what works, and then jump on the bandwagon when it stabilises - but accept that when it does stabilise, you’ll be a few years behind those who have been picking shrapnel out of their hands for the last few years.
    - whstl 39 days ago
      > The reason we stop junior devs from going down this path is because experience teaches us that things will break and when they do, it will incur a world of pain.
      Hyperbole. It's also very often a "world of pain" with a lot of senior code.
    - mannanj 39 days ago
      > things will break and when they do, it will incur a world of pain
      How much if this is still true and exaggerated in our world environment today where the cost of making things is near 0?
      I think “Evolution” would say that the cost of producing is near 0 so the possibility of creating what we want is high. The cost of trying again is low so mistakes and pain aren’t super high. For really high stakes situation (which most situations are not) bring the expert human in the loop until the expert better than that human is AI.
    - bdangubic 39 days ago
      > All of the people decrying "you're not doing it right!" are emphatically proving that LLMs cannot perform these tasks at the level we need them to.
      the people are telling you “you are not doing it right!” - that’s it, there is nothing to interpret addition to this basic sentence
    - neoromantique 39 days ago
      I'm sorry, but I don't agree.
      Current dependency hell that is modern development, just how wide the openings are for supply chain attacks and seemingly every other week we get a new RCE.
      I'd rather 100 loosely coupled scripts peer reviewed by a half a dozen of LLM agents.
      [-]
      - pca006132 39 days ago
        But this doesn't solve dependency hell. If the functionalities were loosely coupled, you can already vendor the code in and manually review them. If they are not, say it is a db, you still have to depend on that?
        Or maybe you can use AI to vendor dependencies, review existing dependencies and updates. Never tried that, maybe that is better than the current approach, which is just trusting the upstream most of the time until something breaks.
        [-]
        neoromantique 39 days ago
        When I need 1% of library's functionality, I can use AI to generate me a good enough replacement that does not require shipping any vendor code.
        Will it be potentially more fragile and less featured? Sure, but it also will not bring in a thousand packages of dependencies.
        joquarky 39 days ago
        Are you really going to manually review all of moment.js just to format a date?
        [-]
        pca006132 39 days ago
        By vendoring the code in, in this case I mean copying the related code into the project. You don't review everything. It is a bad way to deal with dependencies, but it feels similar to how people are using LLMs now for utility functions.
    - baq 39 days ago
      > "LLM as abstraction" might be a possible future, but it assumes LLMs are significantly more capable than a junior dev at managing a growing mess of complex code.
      Ignoring for a second they actually already are indeed, it doesn’t matter because the cost of rewriting the mess drops by an order of magnitude with each frontier model release. You won’t need good code because you’ll be throwing everything away all the time.
      [-]
      - bspinner 39 days ago
        I've yet to understand this argument. If you replace a brown turd with a yellowish turd, it'll still be a turd.
        [-]
        PaulHoule 39 days ago
        In everyday life I am a plodding and practical programmer who has learned the hard way that any working code base has numerous “fences” in the Chesterton sense.
        I think, though, that for small systems and small parts of systems LLMs do move the repair-replace line in the replace direction, especially if the tests are good.
  - sshine 39 days ago
    > I'm now incentivized to use less abstractions.
    I'm incentivised to use abstractions that are harder to learn, but execute faster or more safely once compiled. E.g. more Rust, Lean.
    > If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
    LLMs benefit from abstractions the same way as we do.
    LLMs currently copy our approaches to solving problems and copy all the problems those approaches bring.
    Letting LLMs skip all the abstractions is about as likely to succeed as genetic programming is efficient.
    For example, writing more vanilla JS instead of React, you're just reinventing the necessary abstractions more verbosely and with a higher risk of duplicate code or mismatching abstractions.
    In a recent interview with Bret Weinstein, a former professor of evolutionary biology, he proposed that one property of evolution that makes the story of one species evolving into another more likely is that it's not just random permutations of single genes; it's also permutations to counter variables encoded as telomeres and possibly microsatellites.
    https://podcasts.happyscribe.com/the-joe-rogan-experience/24...
    Bret compares this to flipping random bits in a program to make it work better vs. tweaking variables randomly in a high-level language. Mutating parameters at a high-level for something that already works is more likely to result in something else that works than mutating parameters at a low level.
    So I believe LLMs benefit from high abstractions, like us.
    We just need good ones; and good ones for us might not be the same as good ones for LLMs.
    [-]
    - simonw 39 days ago
      > For example, writing more vanilla JS instead of React, you're just reinventing the necessary abstractions more verbosely and with a higher risk of duplicate code or mismatching abstractions.
      Right, but I'm also getting pages that load faster and don't require a build step, making them more convenient to hack on. I'm enjoying that trade-off a lot.
      [-]
      - joquarky 39 days ago
        Vanilla JS is also a lot more capable than it was when React was invented.
        And yeah, you can't beat the iteration speed.
        I feel like there are dozens of us.
        [-]
        mycall 38 days ago
        Same with the latest CSS removing need for JS in many use cases in HTML.
    - quleap 39 days ago
      Exactly. LLMs are a lot like human developers: they benefit from existing abstractions. Reinventing everything from scratch is a recipe for disaster—especially given an LLM’s limited context window.
  - rdhatt 39 days ago
    I find it interesting for your example you chose Moment.js -- a time library instead of something utilitarian like Lodash. For years I've following Jon Skeet's blog about implementing his time library NodaTime (a port of JodaTime). There are a crazy number of edge cases and many unintuitive things about modeling time within a computer.
    If I just wanted the equivalent of Lodash's _.intersection() method, I get it. The requirements are pretty straightforward and I can verify the LLM code & tests myself. One less dependency is great. But with time, I know I don't know enough to verify the LLM's output.
    Similar to encryption libraries, it's a common recommendation to leave time-based code to developers who live and breathe those black boxes. I trust the community verify the correctness of those concepts, something I can't do myself with LLM output.
  - tyre 39 days ago
    For moment you an use `date-fns` and tree shake.
    I'd rather have LLMs build on top of proven, battle-tested production libraries than keep writing their own from scratch. You're going to fill up context with all of its re-invented wheels when it already knows how to use common options.
    Not to mention that testing things like this is hard. And why waste time (and context and complexity) for humans and LLMs trying to do something hard like state syncing when you can focus on something else?
    [-]
    - simonw 39 days ago
      Every dependency carries a cost. You are effectively outsourcing part of the future maintenance of your project to an external team.
      This can often be a very solid bet, but it can also occasionally backfire if the library you chose falls out of date and is no longer maintained.
      For this reason I lean towards fewer dependencies, and have a high bar for when a dependency is worth adding to a project.
      I prefer a dozen well vetted dependencies to hundreds of smaller ones that each solve a problem that I could have solved effectively without them.
      [-]
      - tyre 39 days ago
        For smol things like left-pad, sure but the two examples given (moment and react) solve really hard problems. If I were reviewing a PR where someone tried to re-implement time zone handling in JS, that’s not making it through review.
        In JS, the DOM and time zones are some of the most messed up foundations you’re building on top of ime. (The DOM is amazing for documents but not designed for web apps.)
        I think we really need to be careful about adding dependencies that we’re maintaining ourselves, especially when you factor in employee churn and existing options. Unless it’s the differentiator for the business you’re building, my advice to engineers is to strongly consider other options and have a case for why they don’t fit.
        AI can play into the engineering blind spot of building it ourselves because it’s fun. But engineering as a discipline requires restraint.
        [-]
        simonw 39 days ago
        Whether that's true about React and Moment varies on a case-by-case basis.
        If you're building something simple like a contact form React may not be the right choice. If you're building something like Trello that calculation is different.
        Likewise, I wouldn't want Moment for https://tools.simonwillison.net/california-clock-change but I might want it for something that needs its more advanced features.
  - travisgriggs 39 days ago
    Has anyone tried the experiment that is sort of implied here? I was wondering earlier today, what it would be like to pick a simple app, pick on OS, and just tell an LLM to write that app using only machine code and native ADKs, and skip all intermediate layers?
    We seem to have created a large bureaucracy for software development, where telling a computer how to execute an app involves keeping a lot of cogs in a big complicated machine happy. But why use the automation to just roll the cogs? Why not just simplify/streamline? Does an LLM need to worry about using the latest and greatest abstractions? I have to assume this has been tried already...
  - nzoschke 39 days ago
    Right there with you.
    I'm instructing my agents to doing old school boring form POST, SSR templates, and vanilla JS / CSS.
    I previously shifted away from this to abstractions because typing all the boilerplate was tedious.
    But now that I'm not typing, the tedious but simple approach is great for the agent writing the code, and great for the the people doing code reviews.
  - qingcharles 39 days ago
    LLMs also have encyclopedic knowledge. Several times LLMs have found some huge block of code I wrote and reduced it down to a few lines. The other day they removed several thousand lines of brittle code I wrote previously for some API calls with a well-tested package I didn't know about. Literally thousands down to dozens.
    My code is constantly shrinking, becoming better quality, more performant, more best-practice on a daily basis. And I'm learning like crazy. I'm constantly looking up changes it recommends to see why and what the reasons are behind them.
    It can be a big damned dummy too, though. Just today it was proposing a massive server-side script to workaround an issue with my app I was deploying, when the actual solution was to just make a simple one-line change to the app. ("You're absolutely right!")
  - throwaway150 39 days ago
    > If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
    But this is a highly non-trivial problem. How do you even possibly manually verify that the test suite is complete and tests all possible corner cases (of which there are so many because synchronizing state is a hard problem)?
    At least React solves this problem in a non-stochastic, deterministic manner. What can be a good reason to replace something like React that works determinstically with LLM-assisted code that is generated stochastically and there's no easy way to manually verify if the implementation or the test suite is correct and complete?
    [-]
    - mlinhares 39 days ago
      You don't, same as for the "generate momentjs and use it". People now firmly believe they can use an LLM to build custom versions of these libraries and rewrite whole ecosystems out of nowhere because Claude said "here's the code".
      I've come to realize fighting this is useless, people will do this, its going to create large fuck ups and there will be heaps of money to be made on the cleanup jobs.
      [-]
      - bonesss 39 days ago
        I think the gap between people dealing with JavaScript cruft all day and backend large systems development is creating a massive conversational disconnect… like, this thread is plain-faced and seriously discussing reinventing date handling locally for funsies.
        I also think that any company creating a reverse-centaur workforce of blind and dumb half baked devs ritualistically shaking chicken bones at their pay-as-you-go automaton has effectively outsourced their core business to OpenAI/MS while paying for the privilege. And, on the twenty year timeline as service and capital costs create crunches, those mega corps will literally be sitting on whole copies of internal business schematics and critical code of their subservient customers…
        They say things, they do other things. Trusting Microsoft not to eat your sector through abusive partner programs and licensing entanglements backed with government capture? Surely the LLMs can explain how that has gone historically and how smart that is going forward.
        [-]
        mlinhares 38 days ago
        They’ve done this before in their locked environments and programming languages, anyone that doesn’t think this is going to end the same way is delusional.
        I’m starting to think actually knowing how to write code might end up being a superpower with so many people completely lost to the stochastic parrots. I’m already getting inbounds from friends and acquaintances that need “help” with their generated shit, gonna start asking for money for it.
      - pertymcpert 39 days ago
        There's going to be lots of fuck ups, but with frontier models improving so much there's also going to be lots of great things made. Horrible, soul crushing technical debt addressed because it was offloaded to models rather than spending a person's thought and sanity on it.
        I think overall for engineering this is going to be a net positive.
  - fweimer 39 days ago
    > If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
    I'm worried that there is a tendency in LLM-generated code to avoid even local abstractions, such as putting common code into separate (local functions), and even use records/structures. You end up with code that is best maintained with an LLM, which is good for the LLM provider and their future revenue. But we humans as reviewers and ultimate long-term maintainers benefit from those minor abstractions.
    [-]
    - simonw 39 days ago
      Yeah, I find myself needing to watch out for that. I'll frequently say "refactor that to reduce duplicated code" - which is generally very safe once the LLM has added test coverage for the new feature.
  - azangru 39 days ago
    > Why do we code with React?
    ...is a loaded question, with a complex and nuanced answer. Especially when you continue:
    > it's worth paying the React complexity/page-weight tax
    All right; then why do we code in React when a smaller alternative, such as Preact, exists, which solves the same problem, but for a much lower page-weight tax?
    Why do we code in React when a mechanism to synchronize data with tiny UI fragments through signals exists, as exemplified by Solid?
    Why do people use React to code things where data doesn't even change, or changes so little that to sync it with the UI does not present any challenge whatsoever, such as blogs or landing pages?
    I don't think the question 'why do we code with React?' has a simple and satisfactory answer anymore. I am sure marketing and educational practices play a large role in it.
    [-]
    - simonw 39 days ago
      Yeah, I share all of those questions.
      My cynical answer is that most web developers who learned their craftsin the last decade learned frontend React-first, and a lot of them genuinely don't have experience working without it.
      Which means hiring for a React team is easier. Which means learning React makes you more employable.
      [-]
      - whstl 39 days ago
        > most web developers who learned their craftsin the last decade learned frontend React-first, and a lot of them genuinely don't have experience working without it
        That's not cynical, that's the reality.
        I do a lot of interviews and mentor juniors, and I can 100% confirm that.
        And funny enough, React-only devs was a bigger problem 5 years ago.
        Today the problem is developers who can *only* use Next.js. A lot can't use Vite+React or plain React, or whatever.
        And about 50% of Ruby developers I interviewed from 2022-2024 were unable to code a FizzBuzz in Ruby without launching a whole Rails project.
        [-]
        CharlieDigital 39 days ago
        My test for FE is to write a floating menu in JSFiddle with only JS, CSS, and HTML. Bonus if no JS.
        If you can do that, then you can probably understand how everything else works.
        [-]
        whstl 39 days ago
        Yep, that's a good test. And it's good even if it's for a React only position.
        azangru 39 days ago
        >> a lot of them genuinely don't have experience working without [react]
        > Today the problem is developers who can only use Next.js. A lot can't use Vite+React or plain React, or whatever.
        Do you want to hire such developers?
        [-]
        whstl 39 days ago
        No, that's why I said "problem".
        My job during the hiring process is to filter them.
        But that's me. Other companies might be interested.
        I often choose to work on non-cookie-cutter products, so it's better to have developers with more curiosity to ask questions, like yourself asked above.
      - findingMeaning 39 days ago
        These people ganging up on you, felt really bad because I support your claim.
        Let me help you with a context where LLMs actually shine and is a blessing. I think it is also same with Karpathy who comes from research.
        In any research, replicating paper is wildy difficult task. It takes 6-24 months of dedicated work across an entire team to replicate a good research paper.
        Now, there is a reason why we want to do it. Sometimes the solution actually lies in the research. Most of research is experimental and garbage code anyway.
        For each of us working in research, LLM is blessing because of rapid prototyping it provides.
        Then there are research engineers whose role is to apply research to production code. We as research engineers really don't care about the popular library. As long as something does the job, we will just roll with it.
        The reason is simple because there is nothing out there that solved the problem.
        As we move further from research, the tools we build will find all sort of issues and we improve on them.
        Idk about what people think about webdev, but this has been my perspective in SWE in general.
        Most of the webdevs here who are coping with the fact that their react skill matters are quite delusional because they have never traversed the stack down to foundation. It doesn't matter how you render the document as long as you render it.
        Every abstraction originates from research and some small proof of concept. You might reinvent abstraction, but when the cost of reinventing it is essentially zero then you are stilfing your own learning because you are choosing to exploit vs choosing to explore.
        There is a balance and good engineers know it. Perhaps all of the people who ganged up on you never approached their work this way.
  - majormajor 39 days ago
    > If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
    for simple stuff, sure, React was ALWAYS inefficient. Even Javascript/client-side logic is still overkill a lot of the times except for that pesky "user expectations" thing.
    for anything codebase that's long-lived and complex, combinatorics tells us how it'll near-impossible to have good+fast test coverage on all that.
    part of the reason people don't roll their own is because being able to assume that the library won't have major bugs leads to an incredible reduction in necessary test service, and generally people have found it a safe-enough assumption.
    throwing that out and trying to just cover the necessary stuff instead - because you're also throwing out your ability to quickly recognize risky changes since you aren't familiar with all the code - has a high chance of painting you into messy corners.
    "just hire a thousand low-skilled people and force them to write tests" had more problems as a hiring plan then just "people are expensive."
  - casualscience 39 days ago
    If you work at a megacorp right now, you know whats happening isn't people deciding to use less libraries. It's developers being measured by their lines of code, and the more AI you use the more lines of code and 'features' you can ship.
    However, the quality of this code is fucking terrible, no one is reading what they push deeply, and these models don't have enough 'sense' to make really robust and effective test suites. Even if they did, a comprehensive test suite is not the solution to poorly designed code, it's a band aid -- and an expensive one at scale.
    Most likely we will see some disasters happening in the next few years due to this mode of software development, and only then will people understand to use these agents as tools and not replacements.
    ...Or maybe we'll get AGI and it will fix/maintain the trash going out there today.
  - starkparker 39 days ago
    I'd rather use React than a bespoke solution created by an ephemeral agent, and I'd rather self-trepanate than use React
  - jayd16 39 days ago
    Why would I want to maintain in perpetuity random snippets when a library exists? How is that an improvement?
    [-]
    - simonw 39 days ago
      It's an improvement if that library stops being actively maintained in the future.
      ... or decides to redesign the API you were using.
      [-]
      - skylurk 39 days ago
        Are you referring to httpx? ;)
  - akoboldfrying 39 days ago
    > and it can maintain a test suite that shows everything works correctly
    Are you able to efficiently verify that the test suite is testing what it should be testing? (I would not count "manually reviewing all the test code" as efficient if you have a similar amount of test code to actual code.)
    Sometimes a change to the code under test means that a (perhaps unavoidably brittle) test needs to be changed. In this case, the LLM should change the test to match the behaviour of the code under test. Other times, a change to the code under test represents a bug that a failing test should catch -- in this case, the LLM should fix the code under test, and leave the test unchanged. How do you have confidence that the LLM chooses the right path in each case?
  - briandw 39 days ago
    I've come to a similar conclusion. One example is how much easier it is to put an interface on top of sqlite. I've been burned badly with the hidden details of ORM s. ORMs are the sirens call of getting rid of all that boiler plate code when encoding and decoding objects into a db. However this abstraction breaks in many hidden ways. Lazy loading details, in-memory state vs db mismatch, cascading details, etc all have unexpected problems that can be hard to predict. Using an LLM to do the grunt work lets you easily see and reason about all the details. You don't have to guess about what's happening and you can make your own choices.
  - avaika 39 days ago
    I don't trust LLM enough to handle the maintenance of all the abstraction buried in react / similar library. I caught some of the LLMs taking nasty shortcuts (e.g. removing test constraints or validations in order to make the test green). Multiple times. Which completely breaks trust.
    And if I have to closely supervise every single change, I don't believe my development process will be any better. If not worse.
    Let alone new engineers who join the team and all of a sudden have to deal with a unique solution layer which doesn't exist anywhere else.
  - nkrisc 39 days ago
    If LLMs are that capable, then why are AI companies selling access to them instead of using them to conquer markets?
    [-]
    - tfirst 39 days ago
      The same question might be asked about ASML: if ASML EUV machines are so great, why does ASML sell them to TSMC instead of fabbing chips themselves? The reality is that firms specialize in certain areas, and may lose their comparative advantage when they move outside of their specialty.
    - lithocarpus 39 days ago
      I would guess fear of losing market share and valuable data, as well as pressure to appear to be winning the AI race for the companies' own stock price.
      i.e competition. If there were only one AI company, they would probably not release anything close to their most capable version to the public. ala Google pre-chatgpt.
      [-]
      - tjr 39 days ago
        I’m not sure that really answers the question? Or perhaps my interpretation of the question is different.
        If (say) the code generation technology of Anthropic is so good, why be in the business of selling access to AI systems? Why not instead conquer every other software industry overnight?
        Have Claude churn out the best office application suite ever. Have Claude make the best operating system ever. Have Claude make the best photo editing software, music production software, 3D rendering software, DNA analysis software, banking software, etc.
        Why be merely the best AI software company when you can be the best at all software everywhere for all time?
        [-]
        sod22 39 days ago
        Im waiting for people to realise that software products are much more than just lines of code.
        Getting sick and tired of people talk about their productivity gains when not much is actually happening out there in terms of real value creation.
        [-]
        pertymcpert 39 days ago
        Just because you don't see it or refuse to believe people doesn't make you right and them liars. Maybe you're just wrong.
        [-]
        sod22 39 days ago
        Or maybe I’m just right and you’re just slow at seeing what other people can see.
        I’m not a SWE either fyi. Therefore I have no vested interest.
    - nl 39 days ago
      Because the LLMs have only got this good 3 months ago, and market dynamics mean they can't hold them in house without their competitors getting ahead.
  - losvedir 39 days ago
    Huh, I've been assuming the opposite: better to use React even if you don't need it, because of its prevalence in the training data. Is it not the case that LLMs are better at standard stacks like that than custom JS?
    [-]
    - simonw 39 days ago
      Hard to say for sure. I've been finding that frontier LLMs write very good code when I tell them "vanilla JS, no React" - in that their code matches my personal taste at least - but that's hardly a robust benchmark.
  - oulipo2 39 days ago
    That's a fundamental misunderstanding
    The role of abstractions *IS* to prevent (eg "compress") the need for a test suite, because you have an easy model to understand and reason about
    [-]
    - simonw 39 days ago
      One of my personal rules for automated test suites is that my tests should fail if one of the libraries I'm using changes in a way that breaks my features.
      Makes upgrading dependencies so much less painful!
      [-]
      - oulipo2 39 days ago
        Of course, but this is largely unmaintainable, shifting the responsibility of correctness check from libraries to users. That's why we modularize/abstract/simplify, in order to minimize the need for actual checks
  - api 39 days ago
    Nutty idea: train on ASM code. Create an LLM that compiles prompts directly to machine code.
  - cyberax 39 days ago
    The problem is, what do you do _when_ it fails? Not "if", but "when".
    Can you manually wade through thousands of functions and fix the issue?
  - godelski 39 days ago
```
  > I'm now incentivized to use less abstractions.
```
    I'd argue it's a different category of abstraction
- kace91 39 days ago
  Our industry wants disruption, speed, delivery! Automatic code generation does that wonderfully.
  If we wanted safety, stability, performance, and polish, the impact of LLMs would be more limited. They have a tendency to pile up code on top of code.
  I think the new tech is just accelerating an already existing problem. Most tech products are already rotting, take a look at windows or iOS.
  I wonder what will it take for a significant turning point in this mentality.
  [-]
  - rgreeko42 39 days ago
    disruption is a code word for deregulation, and deregulation is bad for everyone except execs and investors
    [-]
    - rambojohnson 39 days ago
      it's sadly telling how this comment got greyed out to oblivion.
  - ip26 39 days ago
    One possible positive outcome of all this could be sending LLMs to clean up oceans of low value tech debt. Let the humans move fast, let the machines straighten out and tidy up.
    The ROI of doing this is weak because of how long it takes an expensive human. But if you could clean it up more cheaply, the ROI strengthens considerably- and there’s a lot of it.
- Q6T46nT668w6i3m 39 days ago
  It’s wild that programmers are willing to accept less determinism.
  [-]
  - viraptor 39 days ago
    It's not something that suddenly changed. "I'll generate some code" is as nondeterministic as "I'll look for a library that does it", "I'll assign John to code this feature", or "I'll outsource this code to a consulting company". Even if you write yourself, you're pretty nondeterministic in your results - you're not going to write exactly the same code to solve a problem, even if you explicitly try.
    [-]
    - Night_Thastus 39 days ago
      No?
      If I use a library, I know it will do the same thing from the same inputs, every time. If I don't understand something about its behavior, then I can look to the documentation. Some are better about this, some are crap. But a good library will continuing doing what I want years or decades later.
      An LLM can't decide between one sentence and the next what to do.
      [-]
      - viraptor 39 days ago
        The library is deterministic, but looking for the library isn't. In the same way that generating code is not deterministic, but the generated code normally is.
        [-]
        Night_Thastus 39 days ago
        I...guess? But once you know of a good library for problem X, you don't need to look for it anymore. I guess if you have a bunch of developers and 0 control over what they do, and they're free to drag in additional dependencies willy-nilly, then yes, that part isn't deterministic? But that's a much bigger problem than anything library-related...
    - skydhash 39 days ago
      Contrary to code generation, all the other examples have one common point which is the main advantage, which is the alignment between your objective and their actions. With a good enough incentive, they may as well be deterministic.
      When you order home delivery, you don’t care about by who and how. Only the end result matters. And we’ve ensured that reliability is good enough that failures are accidents, not common occurrence.
      Code generation is not reliable enough to have the same quasi deterministic label.
    - leshow 39 days ago
      It's not the same, LLM's are qualitatively different due to the stochastic and non-reproducible nature of their output. From the LLM's point of view, non-functional or incorrect code is exactly the same as correct code because it doesn't understand anything that it's generating. When a human does it, you can say they did a bad or good job, but there is a thought process and actual "intelligence" and reasoning that went into the decisions.
      I think this insight was really the thing that made me understand the limitations of LLMs a lot better. Some people say when it produces things that are incorrect or fabricated it is "hallucinating", but the truth is that everything it produces is a hallucination, and the fact it's sometimes correct is incidental.
      [-]
      - viraptor 39 days ago
        I'm not sure who generates random code without a goal or checking if it works afterwards. Smells like a straw man. Normally you set the rules, you know how to validate if the result works, and you may even generate tests that keep that state. If I got completely random results rather than what I expect, I wouldn't be using that system - but it's correct and helpful almost every time. What you describe is just not how people work with LLMs in practice.
        [-]
        leshow 38 days ago
        I don't think you understood my comment, I didn't say anything about how to use the tool.
        The parent comment was making the case that humans are as non-deterministic as the LLM is, and I was explaining why that is not true.
      - sod22 39 days ago
        Correct. The thing has no concept of true or false. 0 or 1.
        Therefore it cannot necessarily discern between two statements that are practically identical in the eyes of humans. This doesnt make the technology useless but its clearly not some AGI nonsense.
  - bryanrasmussen 39 days ago
    It's wild that management would be willing to accept it.
    I think that for some people it is harder to reason about determinism because it is similar to correctness, and correctness can, in many scenarios be something you trade off - for example in relation to scaling and speed you will often trade off correctness.
    If you do not think clearly about the difference with determinism and other similar properties like (real-time) correctness which you might be willing to trade off, you might think that trading off determinism is just more of the same.
    Note: I'm against trading off determinism, but I am willing to think there might be a reason to trade it off, just I worry that people are not actually thinking through what it is they're trading when they do it.
    [-]
    - layer8 39 days ago
      Management is used to nondeterminism, because that’s what their employees always have been.
      [-]
      - bryanrasmussen 39 days ago
        hmm, OK good point. But programs that are not deterministic would seem to have a bug that needs fixing. And it can't be fixed, but I guess the employees can't be fixed either.
    - skydhash 39 days ago
      Determinism require formality (enactment of rules) and some kind of omniscience about the system. Both are hard to acquire. I’ve seen people trying hard not to read any kind of manual and failing to reason logically even when given hints about the solution to a problem.
  - whstl 39 days ago
    Why would the average programmer have a problem with it?
    The average programmer is already being pushed into doing a lot of things they're unhappy about in their day jobs.
    Crappy designs, stupid products, tracking, privacy violation, security issues, slowness on customer machines, terrible tooling, crappy dependencies, horrible culture, pointless nitpicks in code reviews.
    Half of HN is gonna defend one thing above or the other because $$$.
    What's one more thing?
    [-]
    - sod22 39 days ago
      Say it louder.
  - zephen 39 days ago
    There has always been a laissez-faire subset of programmers who thrive on living in the debugger, getting occasional dopamine hits every time they remove any footgun they previously placed.
    I cannot count the times that I've had essentially this conversation:
    "If x happens, then y, and z, it will crash here."
    "What are the odds of that happening?"
    "If you can even ask that question, the probability that it will occur at a customer site somewhere sometime approaches one."
    It's completely crazy. I've had variants on the conversation from hardware designers, too. One time, I was asked to torture a UART, since we had shipped a broken one. (I normally build stuff, but I am your go-to whitebox tester, because I hone in on things that look suspicious rather than shying away from them.) When I was asked the inevitable "Could that really happen in a customer system?" after creating a synthetic scenario where the UART and DMA together failed, my response was:
    "I don't know. You have two choices. Either fix it where the test passes, or prove that no customer could ever inadvertently recreate the test conditions."
    He fixed it, but not without a lot of grumbling.
    [-]
    - Verdex 39 days ago
      My dad worked in the auto industry and they came across a defect in an engine control computer where they were able to give it something like 10 million to one odds of triggering.
      They then turned the thing on, it ran for several seconds, encountered the error, and crashed.
      Oh, that's right, the CPU can do millions of things a second.
      Something I keep in the back of my mind when thinking about the odds in programming. You need to do extra leg work to make sure that you're measuring things in a way that's practical.
    - crystal_revenge 39 days ago
      I've recently had a lot of fun teaching junior devs the basics of defensive programming.
      The phrasing that usually make it click for them is: "Yes, this is an unlikely bug, but if this bug where to happen how long would it take you to figure out this is the problem and fix it?"
      In most cases these are extremely subtle issues that the juniors immediately realize would be nightmares to debug and could easily eat up days of hair-pulling work while someone non-technical above them waiting for the solution is rapidly losing their patience.
      The best senior devs I've worked with over my career all have shared an uncanny knack for seeing a problem months before it impacts production. While they are frequently ignored, in those cases more often then not they get an apology a few months down the line when exactly what they predict would happen, happens.
      [-]
      - zephen 39 days ago
        > While they are frequently ignored
        And this is the reason I spent most of the latter part of my career in chip companies.
        Because tapeouts are _expensive_, both in dollar cost, and in lost opportunity cost if the chip comes back broken.
        So any successful chip company knows to pay attention to potential problems. And the messenger never gets shot.
  - tmaly 39 days ago
    I think those that are most successful at creating maintainable code with AI are those that spend more time upfront limiting the nondeterminism aspect using design and context.
  - givemeethekeys 39 days ago
    Mortgages don't pay for themselves.
  - Der_Einzige 39 days ago
    You can have the best of both worlds if you use structured/constrained generation.
  - lopatin 39 days ago
    It's not that wild. I like building things. I like programming too, but less than building things.
    [-]
    - Trasmatta 39 days ago
      To me, fighting with an LLM doesn't feel like building things, it feels like having my teeth pulled.
      [-]
      - i_am_a_peasant 39 days ago
        I am still using LLMs just to ask questions and never giving them the keyboard so I haven’t quite experienced this yet. It has not made me a 10x dev but at times it has made me a 2x dev, and that’s quite enough for me.
        It’s like jacking off, once in a while won’t hurt and may even be beneficial. But if you do it constantly you’re gonna have a problem.
        [-]
        cindyllm 39 days ago
        [dead]
  - wiseowise 39 days ago
    > It’s wild that programmers are willing to accept less determinism.
    It's wild that you think programmers is some kind of caste that makes any decisions.
  - dahcryn 39 days ago
    The good ones don't accept. Sadly there's just many more idiots out there trying to make a quick buck
    [-]
    - lazystar 39 days ago
      Delving a bit deeper... I've been wondering if the problem's related to the rise in H1B workers and contractors. These programmers have an extra incentive to avoid pushing back on c-suite/skip level decisions - staying out of in-office politics reduces the risk of deportation. I think companies with a higher % of engineers working with that incentive have a higher risk of losing market share in the long-term.
      [-]
      - doug_durham 39 days ago
        I’ll answer that with a simple “No”. My H1B colleges are every bit as rigorous and innovative as any engineer. It is in no one’s long term interest to generate shoddy code.
        [-]
        lazystar 39 days ago
        I'm not stating the code is shoddy - I agree the quality's fine. I'm referring to the IC engineer's role in pushing back against unrealistic demands/design decisions that are passed down by the PM's and c-suite teams. Doing this can increase internal tension, but it makes the product and customer experience better in the long run. In my career, I've felt safe pushing back because I don't have to worry about moving if my pushback is poorly received.
  - contravariant 39 days ago
    I mean we've had to cope with users for ages, this is not that different.
  - baq 39 days ago
    This gets repeated all the time, but it’s total nonsense. The output of an LLM is fixed just as the output of a human is.
- exssss 39 days ago
  Out of curiosity, what did you pivot to?
  It sounds crazy to say this, but I've been thinking about this myself. Not for the immediate future (eg 2026), but somewhere later.
- teleforce 39 days ago
  This whole things of AI assisted and vibe coding phenomena including the other comments remind me of this very popular post on HN that keep appearing almost every year on HN [1],[2].
  [1] Don't Call Yourself A Programmer, And Other Career Advice:
  https://www.kalzumeus.com/2011/10/28/dont-call-yourself-a-pr...
  [2] Don't Call Yourself A Programmer, And Other Career Advice (2011):
  https://news.ycombinator.com/item?id=34095775
- scellus 39 days ago
  My work is better than it has been for decades. Now I can finally think and experiment instead of wasting my time on coding nitty-gritty detail, impossible to abstract. Last autumn was the game changer, basically Codex and later Opus 4.5; the latter is good with any decent scaffolding.
  [-]
  - chasd00 39 days ago
    I have to admit, LLMs do save a lot of typing a d associated syntax errors. If you know what you want and can spot and fix mistakes made by the LLM then they can be pretty useful. I don’t think it’s wise to use them for development if you are not knowledgeable enough in the domain and language to recognize errors or dead ends in the generated code though.
- jsk2600 39 days ago
  What are you pivoting to?
  [-]
  - coldpie 39 days ago
    I'm also interested in hearing this.
    For me, I'm planning to ride out this industry for another couple years building cash until I can't stand it, then pivot to driving a city bus.
    [-]
    - baq 39 days ago
      Gardening and plumbing. Driving buses will be solved.
      [-]
      - Buttons840 39 days ago
        Plumbing seems like a relatively popular AI-proof pivot. If AI really does start taking jobs en masse, then plumbers are going to be plentiful and cheap.
        What we really need is a lot more housing. So construction work is a safer pivot. But, construction work is difficult and dangerous and not something everyone can do. Also, society will collapse (apparently) if we ever make housing affordable, so maybe the powers-that-be wont allow an increase in construction work, even if there are plenty of construction workers.
        Who knows... interesting times.
    - layer8 39 days ago
      > then pivot to driving a city bus.
      You seem to be counting on Waymo not obsoleting that occupation. ;)
- lo_zamoyski 39 days ago
  > It’s watching the profession collectively decide that the solution to uncertainty is to pile abstraction on top of abstraction until no one can explain what’s actually happening anymore.
  The ubiquitous adoption of LLMs for generating code is mostly a sign of bad abstraction or the absence of abstraction, not the excess of abstraction.
  And choosing/making the right abstraction is kind of the name of the game, right? So it's not abstraction per se that's a problem.
- zx8080 39 days ago
  That's similar to what happened in Java enterprise stack: ...wrapper and ...factory classes and all-you-can-eat abstractions that hide implementation and make engineering crazy expensive while not adding much (or anything, in most cases) to product quality. Now the same is happening in work processes with agentic systems and workflows.
- dandanua 39 days ago
  Don't forget you are expected to deliver x10 for the same pay, "because you have the AI now".
  [-]
  - baq 39 days ago
    The system is designed to do exactly that. This is called ‘productivity increase’ and is deflationary in large dosages. Deflation sounds good until you understand where it’s coming from.
- karakoram 32 days ago
  A career pivot sounds interesting. Any ideas or recommendations for others considering this? I have seen someone leave SWE to become a commercial pilot which was pretty cool.
- kayo_20211030 39 days ago
  Could we all just agree to stop using the term "abstraction". It's meaningless and confusing. It's cover for a multitude of sins, because it really could mean anything at all. Don't lay all the blame on the c-suite; they are what they are, and have their own view. Don't moan about the latest egregious excess of some llm. If it works for you, use it; if it doesn't, don't. But, stop whinging.
- aleph_minus_one 39 days ago
  > It’s watching the profession collectively decide that the solution to uncertainty is to pile abstraction on top of abstraction until no one can explain what’s actually happening anymore.
  No profession collectively made such a decision. Programming was always very splitted into many, many subcultures, each with their own (mutually incompatible over the whole profession) ideas what makes a good program.
  So, I guess rather some programmers inside some part of a Silicon Valley echo chamber in which you also live made such a decision.
- christophilus 39 days ago
  What are you pivoting to?
- AndrewKemendo 39 days ago
  Every technical person has been complaining about this for the entire history of computer programming
  Unless you’re writing literal memory instructions then you’re operating on between 4 and 10 levels of abstraction already as an engineer
  It has never been tractable for humans to program a series of switches without incredible number of abstractions
  The vast majority of programmers never understood how computers work to begin with
  [-]
  - Trasmatta 39 days ago
    People keep making this argument, but the jump to LLM driven development is such a conceptually different thing than any previous abstraction
  - casey2 39 days ago
    This is true, though the people that actually push the field forward do know enough about every level of abstraction to get the job done. Making something (very important) horrible just to rush to market can be a pretty big progress blocker.
    Jensen is someone I trust to understand the business side and some of those lower technical layers, so I'm not too concerned.
  - fwip 39 days ago
    And if you're writing machine code directly, you're still relying on about ten layers of abstraction that the wizards at the chip design firms have built for you.
- akulbe 39 days ago
  What are you pivoting to?
- godelski 39 days ago
```
  > the solution to uncertainty is to pile abstraction on top of abstraction until no one can explain what’s actually happening anymore.
```
  I've usually found complaints about abstraction in programming odd because frankly, all we do is abstraction. It often seems to be used to mean /I/ don't understand, therefore we should do something more complicated and with many more lines of code that's less flexible.
  But this usage? I'm fully on board. Too much abstraction is when it's incomprehensible. To who is the next question (my usual complaint is that a junior should not be that level) and I think you're right to point out that the "who" here is everyone.
  We're killing a whole side of creativity and elegance while only slightly aiding another side. There's utility to this, but also a cost.
  I think what frustrates me most about CS is that as a community we tend to go all in on something. We went all in on VR then crypto, and now AI. We should be trying new things but it more feels like we take these sides as if they're objective and anyone not hopping on the hype train is an idiots or luddite. The way the whole industry jumps to these things just feels more like FOMO than intelligent strategy. Like making a sparkling water company an "AI first" company... its like we love solutions looking for problems
- casey2 39 days ago
  So you're washing dishes now?
xzkll 42 days ago
Does any of you bother the fact that now you have to pay money in order to do your job? I mean AI model subscriptions. Somehow it feels wrong for me to pay for tools that are trying to replace me.
[-]
- 2sk21 39 days ago
  IDEs used to be extremely expensive back in the 1990s. IDEs such as Microsoft Visual Studio and IBM's Visual age for Java were quite expensive subscription as I recall. subsequently, open source IDEs like Eclipse and VisualStudio seem to have become the norm.
  [-]
  - abeyer 39 days ago
    Visual Studio has never been open source, though some of the underlying build tools and compilers are.
    Visual Studio Code is a different thing... and claims to be open source, but by intent and approach really is closer to source available.
  - qingcharles 39 days ago
    I used to pay a fortune for a full Visual Studio + full MSDN experience every year until I eventually earned a free ride.
    Wild how much you can get for free now. Amazing free IDEs. Every LLM offers excellent free plans if you are on a zero budget.
    $10/mo GitHub Copilot is an absurd deal that has to be a loss in terms of pure compute cost.
  - aeonik 39 days ago
    Compilers and programming languages themselves used to be hideously expensive as well.
- threetonesun 39 days ago
  Between subscription software and subscription AI and the rising prices of computer hardware, the idea of a "personal computer" is quickly dying.
  [-]
  - Aldipower 39 days ago
    Not for me.
- wild_egg 39 days ago
  Your employer is not paying for these things?
- Aozora7 39 days ago
  I'm not paying for an AI subscription to do my job in the same way I don't pay for the IDE I use. My employer does.
- lostmsu 38 days ago
  You can use open models for free. They work at the last year level+.
- mptest 41 days ago
  paying to train* and fund the research for the tools to replace us
BhavdeepSethi 39 days ago
Most of the folks that are talking about this are the ones who work independently and work on greenfield projects (especially tooling related). The cost of making a mistake there is so low. I've used it similarly and it's absolutely amazing. Though I still use a mix of agents and code myself in my regular 9-5 job.
I've yet to see examples of folks using this in a team of 4+ folks working together in a production env with users, and just using AI for their regular development.
Claude code creator only using claude code doesn't count. That's more like dog-fooding.
[-]
- William_BB 39 days ago
  Yeah. It's pretty telling to look at profiles of people who replied to his tweet.
- yeasku 38 days ago
  I have seen it. And it is a mess.
  Is not only that the.code quality is bad, to be fair in most projects is.
  The biggest problem is every single component of the stack uses different conventions and names for everything.
  When nobody looks at the code naming things becomes harder until everything is <generic name>
flumpcakes 39 days ago
> There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and ...
This sounds unbearable. It doesn't sound like software development, it sounds like spending a thousand hours tinkering with your vim config. It reminds me of the insane patchwork of sprawl you often get in DevOps - but now brought to your local machine.
I honestly don't see the upside, or how it's supposed to make any programmer worth their weight in salt 10x better.
[-]
- globnomulous 39 days ago
  > This sounds unbearable.
  I can't see the original post because my browser settings break Twitter (I also haven't liked much of Karpathy's output), but I agree. I call this style of software development 'meeting-based programming,' because that seems to be the mental model that the designers of the tools are pursuing. This probably explains, in part, why c-suite/MBA types are so excited about the tools: meetings are how they think and work.
  In a way LLMs/chatbots and 'agents' are just the latest phase of a trend that the internet has been encouraging for decades: the elimination of mental privacy. I don't mean 'privacy' in an everyday sense -- i.e. things I keep to myself and don't share. I mean 'privacy' in a more basic sense: private experience -- sitting by oneself; having a mental space that doesn't include anybody else; simply spending time with one's own thoughts.
  The internet encourages us to direct our thoughts and questions outward: look things up; find out what others have said; go to wikipedia; etc. This is, I think, horribly corrosive to the very essence of being a thinking, sentient being. It's also unsurprising, I guess. Humans are social animals. We're going to find ourselves easily seduced by anything that lets us replace private experience with social experience. I suppose it was only a matter of time until someone did this with programming tools, too.
  [-]
  - ewoodrich 39 days ago
    https://xcancel.com/karpathy/status/2004607146781278521
    (FYI: you can easily bypass the awful logged out view by replacing x.com with xcancel.com, I use a URL Autoredirector rule to do it automatically in Chromium browsers)
    [-]
    - reconnecting 39 days ago
      Awesome hint!
  - ctmnt 39 days ago
    Use a Nitter mirror [1]. I find xcancel.com the easiest to get to:
    https://xcancel.com/karpathy/status/2004607146781278521
    [1] https://github.com/zedeus/nitter/wiki/Instances
- wakeywakeywakey 39 days ago
  > ... or how it's supposed to make any programmer worth their weight in salt 10x better.
  It doesn't. The only people I've seen claim such speedups are either not generally fluent in programming or stand to benefit financially from reinforcing this meme.
  [-]
  - alexjplant 39 days ago
    For every conspicuous vibecoding influencer there are a bunch of experienced software engineers using them to get things done. The newest generation of models are actually pretty decent at following instructions and using existing code as a template. Building line-of-business apps is much quicker with Claude Code because once you've nicely scaffolded everything you can just tell it to build stuff and it'll do so the same way you would have in a fraction of the time. You can also use it to research alternatives to architectural approaches and tooling that you come up with so that you don't paint yourself into a corner by having not heard about some semi-niche tool that fits your use case perfectly.
    Of course I wouldn't use an LLM to #yolo some Next.js monstrosity with a flavor-of-the-week ORM and random Tailwind. I have, however, had it build numerous parts of my apps after telling it all about the mise targets and tests and architecture of the code that I came up with up front. In a way it vindicates my approach to software engineering because it's able to use the tools available to it to (reasonably) ensure correctness before it says it's done.
  - zarzavat 39 days ago
    The speedup from AI is in the exponent.
    Just the other day ChatGPT implemented something that would have taken me a week of research to figure out: in 10 minutes. What do you call that speedup? It's a lot more than 10x.
    On other days I barely touch AI because I can write easy code faster than I can write prompts for easy code, though the autocomplete definitely helps me type faster.
    The "10x" is just a placeholder for averaging over a series of stochastic exponents. It's a way of saying "somewhere between 1 and infinity"
    [-]
    - flumpcakes 39 days ago
      > Just the other day ChatGPT implemented something that would have taken me a week of research to figure out: in 10 minutes. What do you call that speedup? It's a lot more than 10x.
      Can you share what exactly this was? Perhaps I don't do anything exciting or challenging, but personally this hasn't happened to me so I find it hard to imagine what this could be.
      Instead of AI companies talking about their products, I think the thing to really sell it for me would be an 8 hour long video of an extremely proficient programmer using AI to build something that would have taken them a very long time if they were unassisted.
      [-]
      - zarzavat 39 days ago
        Sure. I needed to draw some parametric and smooth Bézier curves. LLMs are beasts at figuring out the appropriate equations. It would have taken me forever to work out where all the control points should go.
  - johnfn 39 days ago
    I am a professional engineer with around 10 years of experience and I use AI to work about 5x faster on a site I personally maintain (~100 DAU, so not huge, but also not nothing). I don’t work in AI so I get no financial benefit by “reinforcing this meme”.
    [-]
    - danpalmer 39 days ago
      Same position, different results. I'm maybe 20% faster. Writing the code is rarely the bottleneck for me, so there's limited potential in that way. When I am writing the code, things that I'd find easy and fast are a little faster (or I can leave AI doing them). Things that are hard and slow are nearly as hard and nearly as slow when using AI, I still need to maintain most of the code in my head that I'd need to without AI, because it'll get things wrong so quickly.
      I think what you're working on has a huge impact on AI's usability. If you're working on things that are simple conceptually and simple to implement, AI will do very well (including handling edge cases). If it's a hard concept, but simple execution, you can use AI to only do the execution and still get a pretty good speed boost, but not transformational. If it's a hard concept and a hard execution (as my latest project has been), then AI is really just not very good at it.
    - leshow 39 days ago
      Oh, well if it can generate some simple code for your personal website, surely it can also be the "next level of abstraction" for the entirety of software engineering.
      [-]
      - johnfn 39 days ago
        Well, I don’t really think it’s “simple”. The code uses React, nodejs, realtime events pushed via SSE, infra pushed via Terraform, postgres, blob store on S3, emails send with SES… sure, it’s not the next Google, but it’s a bit above, like, a personal blog.
        And in any case, you are moving goalposts. OP said he had never seen anyone serious claim that they got productivity gains from AI. When I claim that, you say “well it’s not the next level of abstraction for all SWE”. Obviously - I never claimed that?
        [-]
        leshow 39 days ago
        If you want my opinion, I think LLMs can be pretty good at generating simple code for things you can find on stackoverflow and require minor adjustments. Even then, if you don't really understand the code you can have major issues.
        Your site is case in point of why LLMs demo well but kind of fall apart in the real world. It's pretty good at fitting lego blocks together based on a ton of work other people have put into React and node or the SSE library you used, etc. But that's not what Karpathy is saying, he's saying "the hottest programming language is english".
        That's bonkers. In my experience it can actually slow you down as much as speed you up, and when you try to do more complicated things it falls apart.
        [-]
        johnfn 39 days ago
        I don't really see how my site is "falling apart in the real world". It is a real site used by real people in the real world. It is not falling apart.
        [-]
        leshow 38 days ago
        I am agreeing with you, LLMs can be useful for simple code generation where you're primarily plugging existing components together.
        [-]
        johnfn 37 days ago
        Again, that’s not what my website is. It’s not simple code generation and I’m not just plugging things together.
      - gejose 39 days ago
        > on a site I personally maintain (~100 DAU, so not huge, but also not nothing)
        This is what the parent said.
        > some simple code for your personal website
        This is your (reductive) characterization of their work. That's fine, but please keep in mind that that's your inference, not what the parent said.
    - iLoveOncall 39 days ago
      > either not generally fluent in programming or stand to benefit financially from reinforcing this meme
      Then figure out which one of the two you are. Years of experience have never equated competence.
      [-]
      - johnfn 39 days ago
        Blindly asserting that everyone who disagrees with you is a shill or incompetent seems unlikely to be conducive to good discourse.
        [-]
        iLoveOncall 38 days ago
        You seem to say that it's not the truth. I disagree.
  - qingcharles 39 days ago
    Practically every post on HN that mentions AI now ends up with a thread that is "I get 100X speed-up using LLMs" vs. "It made me slower and I've never met a single person in real life who has worked faster with AI."
    I'm a half-decent developer with 40 years experience. AI regularly gives me somewhere in the range of 10-100X speed-up of development. I don't benefit from a meme, I do benefit from better code delivered faster.
    Sometimes AI is a piece of crap and I work at 0.5X for an hour flogging a dead horse. But those are rarer these days.
    [-]
    - flumpcakes 39 days ago
      I've posted this on another comment verbatim that was similar to yours, so apologies for the copy and paste:
      Can you share what exactly this was (that got you the 10-100x speedup)? Perhaps I don't do anything exciting or challenging, but personally this hasn't happened to me so I find it hard to imagine what this could be.
      Instead of AI companies talking about their products, I think the thing to really sell it for me would be an 8 hour long video of an extremely proficient programmer using AI to build something that would have taken them a very long time if they were unassisted.
      [-]
      - cheevly 39 days ago
        I would love to make these videos for you if you want to pay for my time. Drop me an email at josh.d.griffith at gmail and tell me what you want to see and compensate. I can vibe code at any scale.
        [-]
        flumpcakes 39 days ago
        I assume this is a reply in jest :)
        > I can vibe code at any scale.
        That's the thing - I know what 'vibe coding' is because that's pretty much how I use AI, as an exploratory tool or interactive documentation or a search engine for topics I want surface level information about.
        It does not make me a 10x-100x more efficient. It's a toy and a learning tool. It could be replaced or removed and I wouldn't miss it that much.
        Clearly I am missing something. I care about quality software, so if it's making someone 100x more productive but their producing the same subpar nonsense they would anyway then I am not interested. Hence I want to see a really proficient programmer use it, be 10x+ more productive, and have a quality product at the end. That's what I want to see demonstrated.
  - packetlost 39 days ago
    Our ops guy has thrown together several buggy dashboards using AI tools. They're passable but impossible to maintain.
    [-]
    - flumpcakes 39 days ago
      I personally think that everyone knows AI produces subpar code, and that the infallible humans are just passing it along because they don't understand/care. We're starting to see the gaslighting now, it's not that AI makes you better, it's that AI makes you ship faster, and now shipping faster (with more bugs) is more important because "tech debt is an appreciating asset" in the world where AI tools can pump out features 10x faster (with the commensurate bugs/issues). We're entering the era of "move fast and break stuff" on steroids. I miss the era of software that worked.
      [-]
      - psidium 39 days ago
        Yep, bugs are already just another cost of doing business for companies that aren’t user-focused. We can expect buggier code from now on. Especially for software where the users aren’t the ones buying it.
        Disclaimer because I sound pessimistic: I do use a lot of AI to write code.
        I do feel behind on the usage of it.
        [-]
        packetlost 39 days ago
        I really wish we would shift back towards quality and reliability being major selling points in software. There's only a handful of projects I'm aware of that emphasize it and they're both pleasures to use: Obidian (note app) and Linear (ticket tracking)
- qudat 39 days ago
  As far as I can tell as a heavy coding agent user: you don’t need to know any of this and that’s a testament to how good code agent TUIs have become. All I do to be productive with a coding agent is tell it to break a problem down into tasks, store it inside beads, and then make sure each step is approved by me. I also add in a TDD requirement where it needs to build tests that fail then eventually pass.
  Everything else I’ve used has been over engineered and far less impactful. What I just said above is already what many of us do anyway.
  [-]
  - halfmatthalfcat 39 days ago
    This sounds like my complete and utter nightmare. No art or finesse in building the thing - only an exercise in torturing language to someone who at a fundamental level doesn't understand a thing.
    [-]
    - baq 39 days ago
      Nothing stopping you from hand sculpting software like we did in the before times.
      Mass production however won’t stop, it’s barely started literally a couple months ago and it’s the slowest and worst it’ll ever be.
      [-]
      - halfmatthalfcat 39 days ago
        I'm not viewing AI tooling as an extinction of the art of programming, only illuminating how telling an AI how to create programs isn't in the same universe as programming, where the technical skill to do such a thing is on par with punching in how long my microwave should nuke my popcorn.
        [-]
        qingcharles 39 days ago
        This isn't my experience. It's more like discussing with another skilled developer on my team how we should code the solution, what APIs we should use, what techniques, what algorithms. Firing ideas back and forth until we settle on a reasonable plan of attack. That plan usually consists of a mix of high level ideas and chunks of example code.
      - saulpw 39 days ago
        I keep hearing "it's the slowest and worst it'll ever be" as though software ability and performance only ever increase and yet mass produced software is slower and enshittier than it was 10-15 years ago and we're all complaining about it. And you can't say "but it does so much more" because I never asked for 90% of the "more" and just want to turn most of it off.
        [-]
        strange_quark 39 days ago
        I’m also not convinced that any of these models are going to stick around at the same level once the financial house of cards they’re built on comes tumbling down. I wonder what the true cost of running something like Claude opus is, it’s probably unjustifiably expensive. If that happens, I don’t think this stuff is going to completely disappear but at some point companies are going to have to decide which parts are valuable and jettison the rest.
        [-]
        qingcharles 39 days ago
        It definitely feels like we're living in the golden time when all the LLMs are getting massively subsidized. You could just tab between all the free accounts all day right now and still get some amazing code results without paying a dime.
        flumpcakes 39 days ago
        I can think of a few things that could happen to sink "it's the slowest and worst it'll ever be". Even ignoring things that could happen, I think in general we're hitting a ceiling with LLMs. All the annoyances and bugs and frankly incompetence with the current models are not going away soon, despite $tn of investments. At this point it is now just about propping up this bubble so the USA doesn't have another big recession.
    - qudat 39 days ago
      I don’t really understand how you got that from my post. I can and do drop in to refactor or work on the interesting parts of a project. At every checkpoint where I require a review I can and do make medications by hand.
      Are you complaining about code formatters or auto fix linters? What about codegen based on APIs specs? A code agent can do all of those and more. It can do all the boring parts while I get to focus on the interesting bits. It’s great.
      Here’s another fantastic use case: have an agent gen the code, think about its prototype, delete, and then rewrite it. I did that on a project with huge success: https://github.com/neurosnap/zmx
    - senordevnyc 39 days ago
      Not really at all like this, more like being a tech lead for a team of savants who simultaneously are great at parts of software engineering, and limited at others. Though that latter category is slimmer than a year ago…
      The point is, you can get lots of quality work out of this team if you learn to manage them well.
      If that sounds like a “complete and utter nightmare”, then don’t use AI. Hopefully you can keep up without it in the long run.
  - tehnub 39 days ago
    I predict by the end of next year we will have our AIs write TPS reports.
  - tehnub 39 days ago
    Beads?
    [-]
    - cygn 39 days ago
      https://github.com/steveyegge/beads
- timcobb 39 days ago
  > This sounds unbearable. It doesn't sound like software development, it sounds like spending a thousand hours tinkering with your vim config
  Before LLM programming, this was at least 30-50% of my time spent programming, fixing one config and build issue after another. Now I can spend way more time thinking about more interesting things.
gghffguhvc 39 days ago
My company takes between Christmas and New Years off. I took a week before that off too. I have not used AI in that time. The slower pace of life is amazing. But when I get back to coding it will be back to running at 180%. It’s the new norm. However I’ve decided to take longer “no computer” breaks in my day. I have to adapt but I need to defend my “take it slow” times and find some analogue hobbies. The shift is real and you can’t wind it back.
[-]
- sshine 39 days ago
  I’ve been taking my son for stroller walks more often over Christmas. I bring a headset for listening to music, podcasts, audiobooks, tech talks. “Be effective.” But I end up just walking and thinking, realising this is “free time”.
  It sounds ridiculous and easy to say spending time walking and thinking will improve your decisions and priorities that no productivity hack will.
  I only actually did slow down for a while because I had to for the well-being of my family. Sure feels important to not always be on top of everyone else’s business.
justatdotin 39 days ago
I think it's mistaken to think in terms of 'falling behind' or 'catching up'
I've seen that these tools have different uses for different devs. I know on my current team, each of us devs works very differently to one another, and we make significant allowances to accommodate for one another's different styles. Certain tasks always go to certain devs; one dev is like a steel trap, another is the chaos explorer, another's a beginner, another has great big-picture perspective, etc. (not sure why but there's even space for myself ;)
In the same way, different devs use these powerful tools in very different ways. So don't imagine you're falling behind, because the only useful benchmark is yourself. And don't imagine you can wait for consensus: you'll still need to identify your personal relationship to the tools.
Most of all, don't be discouraged. Even if you never embrace these tools, there will remain space for your skills and your style of approaching our shared work.
Give it another 10 years and I'm sure this will all become clearer...
[-]
- ChrisMarshallNY 39 days ago
  I’ve become comfortable with using LLMs as “trusted advisors.”
  I am not [yet] ready to just let an agent write a whole app or server for me, but I am increasingly letting them write a whole function for me.
  They are also great “bug finders.” I can just feed some code, describe the symptoms, and ask for an observation. I often get great suggestions, including things like finding typos and copy/pasta problems.
  I find that just this limited application has significantly increased my development velocity, and, I believe, the quality of my work.
  [-]
  - wmoxam 39 days ago
    IMO LLMs make for a great rubber duck https://en.wikipedia.org/wiki/Rubber_duck_debugging
bgwalter 39 days ago
This is from the man who has no finished open source projects and who recommended camera-only FSD to Tesla, which he also did not finish.
The actually productive programmers, who wrote the stack that powers the economy before and after 2023 need not listen to these cheap commercials.
[-]
- anonnon 39 days ago
  > FSD to Tesla, which he also did not finish.
  That's why I've never understood HN's continuing infatuation with him. He failed to deliver FSD to Tesla, and arguably even sent them down a R&D dead end, and he doesn't seem to have played a significant role in the generative AI revolution, only joining OpenAI after they developed ChatGPT. Yet when his talks or blog posts get posted here, they're met with almost uniformly positive comments, often many.
  He reminds me of Sam Altman, where for a while, pointing out that pg's emperor was naked, that his first big "success" was a startup, Loopt, that devolved into a seedy, gaunt gay hookup app, slowly wasting away, that only got acquired thanks to face-saving VC string-pulling, and that that "success" was the springboard of all that followed (YC presidency, feeling out a gubernatorial campaign, OpenAI CEO)--that would get you swiftly flagged.
- CamperBob2 39 days ago
  who recommended camera-only FSD to Tesla
  That's a bummer if true. Is there a reliable source that lays that decision at Karpathy's feet?
  [-]
  - bgwalter 39 days ago
    He was "AI" director at Tesla from 2017:
    https://www.teslarati.com/tesla-ai-director-hiring-autopilot...
    He gave a glowing recommendation for camera-only FSD in 2021:
    https://thenextweb.com/news/tesla-ai-chief-explains-self-dri...
    Then he left Tesla in 2022. So yes, you could argue that it was all Elon's fault and he just followed for 5 years. We won't know with 100% certainty, I'd find it odd to stay 5 years if you think it doesn't work.
    [-]
    - CamperBob2 39 days ago
      Ouch, thanks for the cite.
      What a weird, dumb call that was. "I don't always tackle the toughest engineering problems where lawsuits and lives are at stake, but when I do, I chug a few beers first and tie one hand behind my back."
- threeducks 39 days ago
  > This is from the man who has no finished open source projects
  To be fair, which open source project can really claim that it is "finished", and what does "finished" even mean?
  The only projects that I can truly call "finished" are those that I have laid to rest because they have been superseded by newer technologies, not because they have achieved completeness, because there is always more to do.
  [-]
  - bgwalter 39 days ago
    Then replace "finished" with "production software".
  - bdangubic 39 days ago
    > not because they have achieved completeness, because there is always more to do.
    this is because SWEs love bloat and any good idea eventually needs to balloon into some ever-growing monstrosity :)
  - bdangubic 39 days ago
    > To be fair, which open source project can really claim that it is "finished", and what does "finished" even mean?
    https://github.com/left-pad
gaigalas 42 days ago
> Clearly some powerful alien tool was handed around except it comes with no manual
Using tools before their manual exists is the oldest human trick, not the newest.
bmitch3020 39 days ago
https://xcancel.com/karpathy/status/2004607146781278521
PaulDavisThe1st 39 days ago
He should join the Ardour project. Or go to work for Ableton or Bitwig or Presonus or Digidesign or MOTU or any other DAW manufacturer. Or any video or image editing application. Or get involved with more or less any complex, "creative" native desktop application.
All of the stuff he feels he is falling behind on? Almost completely irrelevant in our domain.
[-]
- senordevnyc 39 days ago
  That’s interesting. I wonder if the models will improve on these kinds of tiny niches?
- inimino 39 days ago
  Not for long.
  [-]
  - PaulDavisThe1st 38 days ago
    I'll bet the next 8 years of my career on it.
sureglymop 39 days ago
> strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering
Sounds fever dreamish. Thank you sincerely (not) for creating it!
xg15 39 days ago
And there it is again, the "powerful alien tool" that was just "handed to us".
No decades of research and massive allocation of resources over the last few years as well as very intentional decision making by tech leadership to develop this specific technology.
Nope, it just mysteriously dropped from the sky one day.
[-]
- layer8 39 days ago
  The point is that all that research mostly doesn’t help in mastering the tool. Unlike traditional tools, it doesn’t come with an instruction manual. It’s like an alien tool just handed to us in exactly that sense.
- Kuinox 39 days ago
  Do you know who is the author ?
  [-]
  - techblueberry 39 days ago
    It’s written in the title of the post “Andrew Karpathy” he’s fairly well known in AI circles, he was head of autopilot at Tesla, and co-founded OpenAI. If you’re curious to learn more about him, the Wikipedia page has a short summary: https://en.wikipedia.org/wiki/Andrej_Karpathy
  - jeltz 39 days ago
    It is even worse coming from him.
  - xg15 39 days ago
    Yes, and I'm disappointed he seems to have joined the AI mysticism crowd.
optician_owl 39 days ago
Oh. This is a pretty stereotypical character among data scientists. This character thinks software development is all about generating text, and because they know how to generate text, they're automatically considered an expert. But they know nothing about the software lifecycle. Working with them is a real pain, especially when they shift responsibility for their "software" to your engineers.
cherry_tree 41 days ago
Behind who?
Is there someone already mastering “agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering” ?
And do they have a blog?
[-]
- lo_zamoyski 39 days ago
  > Behind who[m]?
  Why, the other rats in front of you in the race, of course!
  As the pithy, if cheese expression goes, read not the times; read the eternities. People who spend so much time frantically chasing superficial ephemera like this are people without any sense of life's purpose. They're cogs in some hellish consumerist machine.
clejack 39 days ago
For the folks who have more positive outlooks how often do you change your code after it's been generated?
I haven't used agents much for coding, but I noticed that when I do have something created with the slightest complexity, it's never perfect and I have to go back and change it. This is mostly fine, but when large chunks of code are created, I don't have much context for editing things manually.
It's like waking up in a new house that you've never seen before. Sure I recognize the type of rooms, the furniture, the outlets, appliances, plumbing, and so on when I see them; but my sense of orientation is strained.
This is my main issue at the moment.
[-]
- fragsworth 39 days ago
  > For the folks who have more positive outlooks how often do you change your code after it's been generated?
  Every time, unless my initial request was perfectly outlined in unambiguous pseudocode. It's just too easy to write ambiguous requests.
  Unambiguous but human-readable pseudocode is what I strive for now, though I will often ask AI to help edit the pseudocode to remove ambiguities prior to generating code.
presentation 39 days ago
Anything sufficiently useful will be productized and packaged up by somebody out there so that the masses can use it, the rest will be niche and only relevant for the most hardcore enthusiasts, so I’m not so worried.
_pdp_ 39 days ago
Over 20 years professional experience here. LLM tools feel great. A single person can now accomplish what used to require many teams.
[-]
- bopbopbop7 39 days ago
  Over 30 years code artisan here. AI has made me 100x more productive. No, I will not provide proof. Sam Altman is the best.
- llmslave2 39 days ago
  Over 80 years professional experience here. LLM tools feel great. A single person can now do the work of ten Google and Meta's in a single afternoon.
alphazard 39 days ago
The thing that always trips me up is the lack of isolation/sandboxing that all of the AI programming tools provide. I want to orchestrate a workforce of agents, but they can't be trusted not to run amok.
Does anyone have a better way to do this other than spinning up a cloud VM to run goose or claude or whatever poorly isolated agent tool?
[-]
- dnw 39 days ago
  I have seen Claude disable its sandbox. Here is the most recent example from a couple of weeks ago while debugging Rust: "The panic is due to sandbox restrictions, not code errors. Let me try again with the sandbox disabled:"
  I have since added a sandbox around my ~/dev/ folder using sandbox-exec in macOS. It is a pain to configure properly but at least I know where sandbox is controlled.
  [-]
  - resfirestar 39 days ago
    That refers to the sandbox "escape hatch" [1], running a command without a sandbox is a separate approval so you get another prompt even if that command has been pre-approved. Their system prompt [2] is too vague about what kinds of failures the sandbox can cause, in my experience the agent always jumps straight to disabling the sandbox if a command fails. Probably best to disable the escape hatch and deal with failures manually.
    [1] https://code.claude.com/docs/en/sandboxing#configure-sandbox...
    [2] https://github.com/Piebald-AI/claude-code-system-prompts/blo...
- shepherdjerred 39 days ago
  I'm working on a solution [0] for this. My current approach is:
  1. Create a new Git worktree
  2. Create a Docker container w/ bind mount
  3. Provide an interface for easily switching between your active worktrees/containers.
  For credentials, I have an HTTP/HTTPS mitm [1] that runs on the host with creds, so there are zero secrets in the container.
  The end goal is to be able to manage, say, 5-10 Claude instances at a time. I want something like Claude Code for Web, but self-hosted.
  [0]: https://github.com/shepherdjerred/monorepo/tree/main/package...
  [1]: https://github.com/shepherdjerred/monorepo/pull/156
  [-]
  - aoeusnth1 39 days ago
    This is also what I did. Actually, Claude did it.
- ciconia 39 days ago
  If they cannot be trusted, why would you use them in the first place?
  [-]
  - zephen 39 days ago
    Obviously people perceive value there, but on the surface it does seem odd.
    "These things are more destructive than your average toddler, so you need to have a fence in place kind of like that one in Jurassic Park, except you need to make sure it absolutely positively cannot be shut off, but all this effort is worthwhile, because, kind of like civets, some of the artifacts they shit out while they are running amok appear to have some value."
    [-]
    - chasd00 39 days ago
      It’s shocking the collective shrug I get from our security people at work. I attend pretty serious meetings about genAI implementations and when I ask about points of view around security given things as crazy as “adversarial poetry” is a real thing I just get shrugs. I get the feeling they don’t want to be the ones to say “no, don’t bring genai to our clients” but also won’t dare say “yes, our client’s data is safe with integrated genai”.
    - ares623 39 days ago
      Love the mix of metaphors.
  - CamperBob2 39 days ago
    For the same reason you'd build a fire.
- ashishb 39 days ago
  I run them inside a sandbox https://github.com/ashishb/amazing-sandbox
tired_and_awake 38 days ago
Early adopters will early adopt. They will toil,feedback, improve, repeat. Tools will over optimize then dramatically shift based on learnings.
This chaps will continue until something moderately productive and easily adoptable comes out. FOMO will strike all of us from time to time. Some of us will even try out the latest and greatest and see if it sticks.
Some companies will mandate arbitrary code generation standards because "it's the basis of their success", it will polarize their talent pool. Later, it will be impossible to determine if they were (not) successful "inspite of" or "because of" such wild decisions.
tehjoker 39 days ago
The person saying this has a financial interest in saying so.
finolex1 39 days ago
Is there anything substantial in his list ("agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations") that Claude Code or Cursor don't already incorporate?
I empathize with his sense that if we could just provide the right context and development harness to an AI model, we could be *that* much more productive, but it might just be misplaced hope. Claude Code and Cursor are probably not that far from the current frontier for LLM development environments.
zmmmmm 39 days ago
Definitely don't hang out on Hacker News then. It's absolutely the worst place for imposter syndrome or people with any kind of skill inferiority anxiety or confidence issue. Half the reason I read HN is because the anxiety it induces is moderately constructive in motivating me to ensure I keep learning and stay up to date. But I definitely come away every day with a distinct impression I'm below baseline in skill and knowledge for my field, even though within my own circles I'm considered expert by all my peers.
[-]
- kubb 39 days ago
  Really? It’s the opposite for me. The number of people being confident about things they have no clue about just makes me more arrogant.
  [-]
  - zmmmmm 38 days ago
    I guess I get both effects. Every now and then there is a post about something I'm actually expert in and the standard of the comments sometimes shocks me. It's hard to connect the two experiences.
robotresearcher 39 days ago
Andrej is 39 years old, according to Wikipedia.
Douglas Adams on age and relating to technology:
"1. Anything that is in the world when you’re born is normal and ordinary and is just a natural part of the way the world works.
2. Anything that’s invented between when you’re fifteen and thirty-five is new and exciting and revolutionary and you can probably get a career in it.
3. Anything invented after you’re thirty-five is against the natural order of things."
From 'The Salmon of Doubt' (2002)
[-]
- aoeusnth1 39 days ago
  Is that really what he's saying here?
  He's not against the technology, I think he's just feeling like there's a lot of potential that he's not quite grasping yet.
  [-]
  - BearOso 39 days ago
    This guy is one of the top names in AI. This is pure propaganda written to instill "fear of missing out" and encouraging people to buy into his platform, lest they become "obsolete."
    [-]
    - PaulHoule 39 days ago
      It’s a little shocking to me that this sentiment hasn’t floated higher in the discussion. Regardless of how he feels, this is the way he wants you to feel.
      Big picture it’s about emotional intelligence and if you are losing your shit you’re going to flail around. I think you should pick up some near-frontier tools and use them to improve your usual process, always keeping your feet on the ground. “Vibe coding” was always about getting you and keeping you over your head. Resist it!
      [-]
      - gsf_emergency_6 39 days ago
        vive vibe live or it doesnt matter?
        Maybe Devs should handle copilots as Swiss prana-bindu their shots
        (Therefore gun laws at a longer timescale)
        Of course we have to ask aeb if he has ever run into someone who trips (only, of course) while hunting ;) have you?
        [-]
        aebtebeten 39 days ago
        the french on the good hunter^W vibe coder vs the bad vibe coder: https://www.youtube.com/watch?v=QuGcoOJKXT8
        given that the 3 hares seem to currently lack a signification, I'd be up for squatting? Or would Paul prefer 3 fennecs? Should anyone wish to oppose us, as Bigwig said: "silflay hraka, u embleer rah"
        a slightly more pragmatic story for shunya as better mousetrap: just as we now routinely have our calculations done for us in binary, but record results in decimal (in PDF invoices, say), ancient romans (among other cultures) would have someone do their calculations on a counting https://en.wikipedia.org/wiki/Counting_board board, but recorded (only the non-zero) results in roman numerals.
        (these days we can spot the algebraists via a sibboleth: they start their papers and books with section/chapter 0)
        > « Les hommes sont comme les chiffres : ils n'acquièrent de valeur que par leur position. » —NB
        [-]
        gsf_emergency_6 38 days ago
        How we seem to be doing:
        https://www.neatorama.com/2012/05/18/10-facts-you-might-not-...
        Re boney quote, that's one heuristic for HN mods
        TIL Mozilla would have done better channelling the Finnic fennec (Vs rebranding "pinko"). Globe-wrappin Oxygen Auroras it wasn't.
        Haploid fox
        https://en.wikipedia.org/wiki/Inari_%C5%8Ckami#:~:text=The%2...
        weregiraffe 39 days ago
        Are you grok, or having a stroke?
        [-]
        MarcelOlsz 39 days ago
        I understood perfectly what he's saying, but then again schizo is a language I speak fluently. Are you having a stroke?
        [-]
        PaulHoule 38 days ago
        To be fair I did have just a touch of thought disorder which led me to write "vive" instead of "vibe" and I did correct it when it was pointed out without explaining it which made that comment seem even weirder than it originally was.
        [-]
        Izkata 38 days ago
        I actually read their comment as "vibe vibe live" which combined with the unknown terms in the next line (a reference to Dune combined with something else, I guess?) made GGP's question fit quite well.
    - 8note 39 days ago
      on the other hand, it does currently feel like when angular and react were starting to come out, and there was a billion different javascript libraries to learn with a new one coming out every couple weeks, and you arent quite sure what you should spend your time on and how much, vs now where you just learn react, and maybe extend to next.js
      LLM forward development has a lot of things going on, and it really isn't clear yet what is going be the common standard in a few years time in terms of dev ux, async tools, ci/cd tools, in production and offline workflows, etc.
      its an easy time to hop down a wrong path picking subpar tools or not experimenting further, but if you just wait, the people who try the right tools are going to be way ahead on making products for their customers.
    - sailingparrot 39 days ago
      Uncharitable take. His last public stance on this a few months ago when he released nanochat was that he didn’t use coding LLM for it, even though he tried, because they were not good enough and he was just losing time, so coded everything manually. Andrej is already set for life, and has moved into education where most of what he does is released for free.
    - neilv 39 days ago
      Exactly. I think some of the commenters were unaware of some of the context, and got an entirely different read on the piece.
  - robotresearcher 39 days ago
    > Is that really what he's saying here?
    No it’s absolutely not. But I thought it’d be fun to offer Adams’ brilliant hyperbole for an affectionate ribbing of Karpathy. Both of them are great communicators of ideas.
- BonoboIO 39 days ago
  This is pretty much the thinking across all German-speaking countries. It especially applies to anything related to energy (combustion engines, coal, gas, oil) and IT.
  Case in point: fax machines are still an important part of business communication in Germany, and many IT projects are genuinely amateurish garbage — because the underlying mindset is "everything should stay exactly as it is."
  This is particularly visible in the 45+ generation. It mostly doesn't apply to programmers, since they tend to find new things interesting. But in the rest of society, the effects are painful to watch: if nothing changes, nothing improves.
  And then there's mobile infrastructure. It's not even a technical problem — it's purely political. The networks simply don't get expanded. It's honestly embarrassing how far behind Germany is compared to the rest of Europe.
  [-]
  - anthk 39 days ago
    Spain the same with Java, and that language it's full of bureaucratic bullshit to make mid managers feel better. Ditto with Power Points and the like. They need to dissapear for the good.
    Something lke the PDF's produced from sent(1) under Unix or MagicPont presentations are many times less fancier and they allow to produce effective no-bullshit ACTUAL product based presentations. But then half of the commercials and managers would actually useless (as they are) and they would be kicked out fast. And don't let me start on nepotism...
- ilaksh 39 days ago
  You didn't really read what he wrote or think about it and just took it as an opportunity to dismiss him as old. He was just being humble. It's relatively new to everyone. At least you are honest about your ageism.
  I am sure Karparthy can and does everage AI as well or better than you. Probably I do also and I am 48.
  [-]
  - robotresearcher 39 days ago
    I’m older than both of you.
paxys 39 days ago
I have never felt this much ahead as a programmer. So many developers I see, including at my workplace, are blindly prompting models hoping to solve their problem and failing every step of the way. The people who truly understand what is happening are still in the ruling class, and their skills are not going to be irrelevant anytime soon.
[-]
- sod22 39 days ago
  Yep when all this blows over, those who were least exposed to LLMs will be the winners. Patience is important and not to be drowned out by the noise.
- georgeburdell 39 days ago
  Not sure what you mean by blindly prompting models
- misiti3780 39 days ago
  100% - I cant believe there are smart people in this conversation that dont see this.
  If you dont understand AWS you can't vibe code a terraform codebase that creates a complex infrastructure .. etc
albert_e 39 days ago
I can attest to one thing that has grown 10x for sure -- FOMO.
kusokurae 39 days ago
This is sales propaganda that should not be endorsed by sharing or further publication.
design2203 42 days ago
I’m convinced much of this is all noise - people seem to be focusing on the wrong unit of analysis. Producing software and lots of it has never been a problem - coming up with the right projects and producing a vertically differentiated product to what already exists is.
[-]
- rishabhaiover 42 days ago
  That's true. The noise is being generated by people who are directly or indirectly incentivized to talk about it.
  > coming up with the right projects and producing a vertically differentiated product to what already exists is.
  Agreed but not all engineers are involved with this aspect of the business and the concern applies to them.
deadbabe 39 days ago
I think this is mostly a frontend sentiment.
In the backend, we're mostly just pushing data around from one place to another. Not much changes, there's only a few ways to really do that. Your data structures change, but ultimately the work is the same. You don't even really need an LLM at all, or super complex frameworks and ORMs, etc.
[-]
- fallat 39 days ago
  Sounds like you'd use an LLM exactly for that.
  We don't need you.
  [-]
  - deadbabe 39 days ago
    Why pay for LLM when you can just do it easily for free?
    The end goal is to get rid of all frontends anyway, just have apps that you interact with through LLM prompts. A more advanced command line.
nineteen999 39 days ago
I'm actually having more fun than I've had in years with this, since I've mainly focussed on my personal projects while getting the hang of what's achievable. And it turns out to be quite a lot if you're a creative thinker.
At first it kind of depressed me, but now I realised that actually writing code is only part of my day job, the rest is integrating infrastructure and managing people and enabling them to do their job as well, and if I can do the coding/integration part faster and give them better tools more quickly, that's a huge win.
This means I can spend more time at the beach and on my physical and mental well being as well. I was stubborn and skeptical a year ago, but now I'm just really enjoying the process of learning new things.
tjr 42 days ago
Being a nondeterministic tool, the output for a given input can vary. Rather than having a solid plan of, "if I provide this input, then that will happen", it's more like, "if I do something like this, I can expect something like that, probably, and if not, then try again until it works, I suppose".
What are the productivity gains? Obviously, it must vary. The quality of the tool output varies based on numerous criteria, including what programming language is being used and what problem is trying to be solved. The fact that person A gets a 10x productivity increase on their project does not mean that person B will also get a 10x productivity increase on their project, no matter how well they use the tool.
But again, tool usage itself is variable. Person A themselves might get a 10x boost one time, and 8x another time, and 4x another time, and 2x another time.
[-]
- grim_io 42 days ago
  Non determinism does not imply non correctness. You can have the LLM do 10 different outputs, but maybe all 10 are valid solutions. Some might be more optimal in certain situations, and some might appeal to different people aesthetically.
  [-]
  - tjr 42 days ago
    Nondeterminism indeed does not imply non-correctness.
    All ten outputs might be valid. All ten will almost certainly be different -- though even that is not guaranteed.
    The OP referred to the notion of there being no manual; we have to figure out how to use the tool ourselves.
    A traditional programming tool manual would explain that you can provide input X and expect output Y. Do this, and that will happen. It is not so clear-cut with AI tools, because they are -- by default, in popular configurations -- nondeterministic.
    [-]
    - grim_io 42 days ago
      We are one functional output guarantee away from them being optimizing compilers.
      Of course, we maybe never get there :)
      [-]
      - tjr 42 days ago
        Why would one opt to use an LLM-based AI tool as a compiler? It seems that would be extraordinarily complex over traditional compilers, but for what benefit?
        [-]
        grim_io 42 days ago
        It would be, in its ideal state a vague problem to concrete and robust implementation compiler.
        A star trek replicator for software.
        Obviously we are nowhere near that, and we may never arrive. But this is the big bet.
        [-]
        optimalsolver 40 days ago
        >A star trek replicator for software
        That's a very interesting way to put it.
- general1465 42 days ago
  Non determinism of AI feels like a compiler which will on same input code spit out different executable on every run. Fixing bugs will become more like a ritual to satisfy whims of the machine spirit.
  [-]
  - fragmede 42 days ago
    But how different? Compilers do, in fact, spit out different binaries with each run. There are timestamps and other subtle details embedded in them (esp compiler version and linking) that make the same source result in a different binary. "That's different"; "that's not the same thing!" I see you thinking. As long as the AI prompt "make me a login screen" results in a login screen appropriate for the rest of the code, and not "rm -rf ~/", does it matter if the indeterminism produces a login page with a Google login page before the email login button or after?
- stack_framer 42 days ago
  Also interesting is the possibility that a 10x boost for person A might still be slower than person B not using AI.
zmj 39 days ago
Yes-ish. It's worth keeping up with the rising tide of model capabilities, but it's not worth stressing over eliciting every last drop. Many of the specific techniques that add value today will be wasted effort with smarter models in a month or two.
Uptrenda 39 days ago
AI hype man in AI continues to hype AI. Who could have predicted this.
netdevphoenix 39 days ago
I used to hold Karpathy in high esteem. But the stream of posts coming from him since LLMs took over the "AI" word makes me wonder if he has lost the spark
budududuroiu 39 days ago
Idk why people take everything that Karpathy says as canon. I find his takes post inventing the "vibe coding" term deeply unserious and vapid
[-]
- ex-aws-dude 39 days ago
  Yeah I don’t get it, there are certain people where even just their tweets are front page headlines on HN
kiriakosv 39 days ago
I may be extremely ignorant here but I think Karpathy is primary and foremost a great pitcher - salesman, not only for AI in general but on his personal branding as well.
He is also great at explaining AI related concepts to the masses.
However his takes on software engineering show someone that hasn’t spend a significant amount of time doing production grade software engineering, and that is perfectly fine and completely normal given his background.
But that also means that we should not take his software engineering opinions as gospel.
oulipo2 39 days ago
"Guy who builds AI for a living is telling people to believe that AI is the single best thing invented since butter" how surprising
Animats 39 days ago
I feel that way, too.
"Vibe programming" is less than a year old. What is programming going to look like in a few years?
badgersnake 39 days ago
If there was more substance behind the hype this might actually be true. But unless you’re in some very specific niches, it’s bollocks.
You’re not doing it wrong, the tools just aren’t all they’re cracked up to be. They are annoying good enough to get you to waste a load of time trying to get them to do what it looks like they should be able to do.
culi 39 days ago
Yeah the author of this tweet is
> Building @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI
1970-01-01 39 days ago
I love that Agile and Scrum is still unmentioned. Can we stick a fork in it yet?
[-]
- layer8 39 days ago
  Don’t you do retrospectives with your coding agents?
- zephen 39 days ago
  No, no, no.
  We need to have a scrum with 3 agents each from the top 4 AI vendors, with each agent adhering to instructions given by a different programmer.
  It's kind of like Robot Wars, except the damage is less physical and more costly.
PaulHoule 39 days ago
I don't have a lot of patience for this sort of take because my north star is project management and in my normal moving forward model I work in milestones where I stack up my tools and get something specific done and screwing around with tools is heavily timeboxed. If A.I. tools help me make progress great, if they don't, I will fall back to manual methods, get that phase of work done or (rarely) give up on the subproject. After I get some distance from it I can consolidate my learnings, try a different approach.
It's death though to be excessively reading tweets and blogs about this stuff, this will have you exhausted before you even try a real project and comparing yourself to other people's claims which are sometimes lies, often delusional, ungrounded and almost always self-serving. In sofar someone is getting things done with any consistency they are practicing basic PM, treating feelings of exhaustion, ungroundedness and especially going in circles as a sign to regroup, slow down and focus on the end you have in mind.
If the point really is to research tools than what you do is break down that work into attainable chunks, the way you break down any other kind of work.
kshri24 39 days ago
> Roll up your sleeves to not fall behind
This confirms AI bubble for me and it now being entirely FUD driven. "Not fall behind" should only apply to technologies where you have to put active effort to learn as it requires years to hone and master the craft. AI is supposed to remove this "active effort" part so as to get you upto speed with the latest and bridge the gap between those "who know" and those "who do not". The fact you need to say "roll up your sleeves to not fall behind" confirms we are not in that situation yet.
In other words, it is the same old learning curve that everyone has to cross EXCEPT this time it is probabilistic instead of linear/exponential. It is quite literally a slightly better than coin toss situation when it comes to you learning the right way or not.
For me personally, we are truly in that zone of zero active effort and total replacement when AI can hit a 100% on ALL METRICS consistently, every single time, even on fresh datasets with challenging questions NOT SEEN/TRAINED by the model. Even better if it can come up with novel discoveries to remove any doubts. Chances of achieving that with current tech is 0%.
ciconia 39 days ago
I for one am not using AI, will not touch that steaming pile of manure with a 10 yard stick, and I couldn't care less about the so called magnitude 9 earthquake. When this bubble finally bursts into nothingness, I'll be still here practicing my craft and providing real value for my clients.
[-]
- llmslave2 39 days ago
  I'm using it less and less now, since the sheen has worn off and I've been able to more accurately judge its capabilities. It's like an intern at everything it does and unfortunately I'm expected to produce better code than that.
  [-]
  - Capricorn2481 39 days ago
    I'm very confused, are you or are you not an LLM run account?
    A couple weeks ago, under a freshly made account "llmslave", you said it's already replacing devs and the field is cooked, and anyone who doesn't see that lacks the skills to adopt AI [1]
    I pointed out that given your name and low quality comments, you were likely an LLM run account. As SOON as I made that comment, you abandoned the account and have now made a duplicate llmslave2 account, with a different opinion
    Are you doing an experiment or something?
    [1] https://news.ycombinator.com/item?id=46291504#46292968
    [-]
    - llmslave2 39 days ago
      No, I'm just a fan account. No affiliation with the OG llmslave, I just thought the name and concept was funny.
      [-]
      - Capricorn2481 39 days ago
        Thanks for confirming what I thought.
  - tehlike 39 days ago
    When was the last time you used it?
    [-]
    - llmslave2 39 days ago
      An agent like Claude code? Maybe a few weeks ago. I use ai autocomplete and ask Claude to explain basic stuff outside my wheelhouse, generate throwaway bash scripts, etc. And I have Claude review code I'm unsure of / rubber ducky debugging, but that's about it.
davesque 39 days ago
Honestly surprised at this take by him. For one, feels like exaggeration. For two, are these tools really that hard to use?
[-]
- krackers 39 days ago
  I'm surprised too, considering that in https://x.com/karpathy/status/1977758204139331904 he mentioned regarding his NanoChat repo
  >Good question, it's basically entirely hand-written (with tab autocomplete). I tried to use claude/codex agents a few times but they just didn't work well enough at all and net unhelpful, possibly the repo is too far off the data distribution.
  And a lot of the tooling he mentioned in OP seems like self-imposed unnecessarily complexity/churn. For the longest time you could say the same about frontend, that you're so behind if you're not adopting {tailwind, react, nodejs, angular, svelte, vue}.
  At the end of the day, for the things that an LLM does well, you can achieve roughly the same quality of results by "manually" pasting in relevant code context and asking your question. In cases where this doesn't work, I'm not convinced that wrapping it in an agentic harness will give you that much better results.
  Most bespoke agent harnesses are obsoleted by the time of the next model release anyway, the two paradigms that seem to reliably work are "manual" LLM invocation and LLM with access to CLI.
- moduspol 39 days ago
  I think the evidence is that even amongst evangelists, they all seem to have different sets of key techniques that change every few months.
- encyclopedism 39 days ago
  > are these tools really that hard to use?
  Exactly! If people have 'never felt this far behind' and the LLM's are that good. Ask the LLM to teach you.
  Like so many articles on 'prompt engineers' this (never felt this behind) take too is laughable. Programmers having learnt how to program (writing algorithms, understanding data structures, reading source code and API docs) are now completely incapable of using a text box to input prompts? Nor can they learn how to quickly enough! And it's somehow more difficult than what they have routinely been doing? LOL
  [-]
  - cheevly 39 days ago
    Frontier AI isnt trained on frontier AI. I wish HN would collectively stop and actually think before they post.
anonzzzies 39 days ago
I wish I got out more. I used to go a lot to meetups and sit next to people 'closer to the hype' showing me the cutting edge stuff; often it was just a 'meh' experience vs the 'this is like seeing god' type of comments on hn/reddit and sometimes it is an eye opener (rarely). The 'meh' is usually when people claim it is 10000x more productive: I sit next to them and seeing them struggle to get even the basics done; after that, they struggle with the same issues I do when I try it while they are the 'experts' and I learn that people call things productive when they are kept 'busy' not actually producing results faster.
Anyway:
> agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations,
give me extreme Emacs 'setup' feelings: I was at a meetup in hk recently where there was someone advocating this and it was just depressing; spending hours on stuff that changes daily while just my vanilla claude code with playwright mcp runs circles around it, even after it has been set up. It is just not better at all and until someone can show that it is actually an improvement WITH the caveat that when it is an improvement on t(1), it doesn't need a complete overhaul at t(n) where n is a few days or weeks just because the hype machine says so. This measured against a vanilla CC without any added tooling except maybe playwright mcp.
People just want to scam themselves in feeling useful: if the ai does the work, then you find some way of feeling busy by adding and finetuning stuff to feel useful.
halfmatthalfcat 39 days ago
Wow - can we coin "Slopbrain" for people who are so far gone into AI eventualism that they can no longer function? Liked "cooked" but "slopped" or something. Good grief lol. Talk about getting lost in the sauce...
[-]
- roadside_picnic 39 days ago
  WSJ has been writing increasingly about "AI Psychosis" (here's their most recent piece [0]).
  I'm increasingly seeing that this is the real threat of AI. I've personally known people who have started to strain relationships with friends and family because they sincerely believe they are evolving into something new. While not as dramatic, the normalization of the use of "AI as therapist" is equally concerning. I know tons of people that rely on LLMs to guide them in difficult family decisions, career decisions, etc on an almost daily basis. If I'm honest, I myself have had times where I've leaned into this too much. I've also had times where AI starts telling me how clever I am, but thankfully a lifetime of low self worth signals warning flags in my brain when I hear this stuff! For most people, there is real temptation to buy into the praise.
  Seeing Karpathy claim he can't keep up was shocking. It also immediately raises the question to anyone with a clear head: "Wait, if even Karpathy cannot use these tools effectively... just what is so useful about AI?" Isn't the entire point of AI that I can merely describe my problem and have a solution in a fraction of the time.
  The fact that so many true believers in AI seem to forever be just a few more tricks away from really unleashing this power, starts to make it feel very much like magical thinking on a huge scale.
  The real danger of AI is that we're entering into an era of mass hallucination across multiple fields and areas of human activity.
  0. https://www.wsj.com/tech/ai/ai-chatbot-psychosis-link-1abf9d...
  [-]
  - tyre 39 days ago
    > I've personally known people who have started to strain relationships with friends and family because they sincerely believe they are evolving into something new.
    Cryptoboys did it first, please recognize their innovation ty
  - lukev 39 days ago
    That's NOT AI psychosis, which is real, and which I've seen close-up.
    AI psychosis is getting lost in the sauce and becoming too intimate with your ChatGPT instance, or believing it's something it's not.
    Skepticism, or a fear of being outside the core loop is the exact opposite, and that's what Karpathy is talking about here. If anything, this kind of post is an indicator that you're absolutely NOT in AI psychosis.
    [-]
    - tom_ 39 days ago
      "the core loop"? What is this?
  - sho_hn 39 days ago
    Cyberpunk was right!
  - bentobean 39 days ago
    I would really like to hear more about these acquaintances who think they are evolving.
  - timcobb 39 days ago
    WSJ is Fox News Platinum, I wouldn't overthink it
- johnfn 39 days ago
  I feel Karpathy is smart enough to deserve a less dismissive response than this.
  [-]
  - halfmatthalfcat 39 days ago
    A mix of "too clever by half" and "never meet your heroes".
  - rideontime 39 days ago
    Why do you feel that way?
    [-]
  - techblueberry 39 days ago
    You think we should appeal to authority rather than address the ideas on their own merits?
    [-]
    - johnfn 39 days ago
      How is saying the author has “slopbrain” is “addressing the idea on its own merits”? It’s just name calling.
      [-]
      - halfmatthalfcat 39 days ago
        They aren't addressing my comment (which is obviously an overreaction to the tweet), he's asking you why we should appeal to authority rather than evaluate whether Karpathy is completely overreacting and in way too deep.
        [-]
        johnfn 39 days ago
        The intent of my comment was to state that you should write something more substantive than dismissing Karpathy as “slopbrain”. I wasn’t appealing to authority by saying that he was correct — just that he deserves more than name calling in a response.
        [-]
        halfmatthalfcat 39 days ago
        Evidently by "LLM/AI psychosis" coming into the mainstream zeitgeist, "slopbrain" isn't too far off.
        [-]
        johnfn 39 days ago
        Now you're just saying "AI psychosis exists" (true) and then saying Karpathy has it. That is, again, essentially name calling, like saying someone is insane rather than addressing their points.
        If you really think Karpathy is psychotic you should explain why, but I don't think anything in the Tweet suggests that. My read of his tweet is that there is a lot of churn and new concepts in the software engineering industry, and that doesn't seem like a very psychotic thing to say.
- throwatdem12311 39 days ago
  I call it being "oneshot" by the AI.
- dvrp 39 days ago
  Twitter folks call this LLM or AI Psychosis.
- Starlevel004 39 days ago
  We could call it "Hacker News syndrome"
- calf 39 days ago
  Slopbrain is interesting because Karpathy's fallacious argumentation mirrors the glib argument of an LLM/AI, it's like cognitively recursive, one feeding the other in a self-selecting manner.
- weregiraffe 39 days ago
  Slippery slop?
- llmslave2 39 days ago
  [flagged]
  [-]
  - Neywiny 39 days ago
    This is what I keep hearing. "You just need something more agentic" "if you had the context length you could've fixed that" etc etc. yeah sure. I'll believe it when I see it. For me it's parsing 3000 page manuals for relevant data. I can do it fairly competently from experience, but I see a lot of people not familiar with them struggle to extract the info they need, and AIs just cannot hold all that context in my experience
leecommamichael 41 days ago
Mind you he is in the industry, and founding a company whose success depends on this stuff.
[-]
- overtone1000 39 days ago
  He meant to post that from his alt account 'regularcoderguy'
themafia 39 days ago
> and a failure to claim the boost feels decidedly like skill issue.
And a failure to clarify the project you're currently working on and the actual results feels decidedly like a propaganda issue.
Take all the digs at my skills you want. I'd rather not be a bald faced liar.
dude250711 42 days ago
Man, this is giving me a cognitive dissonance compared to my experiences.
Actually, even the post itself reads like a cognitive dissonance with a dash of the usual "if it's not working for you then you are using it wrong" defence.
[-]
- credit_guy 42 days ago
  I feel exactly like Karpathy here. I have some work to do, and I know exactly what I need to do, and I'm able to explain it to AI, and the AI seems to understand me (I'm lately using Opus 4.5). I wrote down a roadmap, it should take me a few weeks of coding. It feels like with a proper workflow with AI agents, this work should be doable in one or two days. Yet, I know by now that it's not going to be nearly that fast. I'll be lucky if I finish 30% faster than if I just code the entire damn thing myself. The thing is, I am a huge AI optimist, I'm not one of the AI skeptics, not even close. Karpathy is not an AI skeptic. We just both feel this sense of possibility, and the fact that we can't make AI help us more is frustrating. That's all. There's no telling anyone else "it's on you if you can't make it work for you". I think Karpathy figured out by now, and at least I did, that the number of AI skeptics by now far outnumbers the number of AI optimists, and it has become something akin to a political conviction. It's quite futile to try and change someone's mind about whether AI is good, bad, overhyped, underused, etc. People picked their side and that's that.
  [-]
  - design2203 42 days ago
    “We just both feel this sense of possibility, and the fact that we can't make AI help us more is frustrating”
    The mirage is alluring.
    [-]
    - nextworddev 42 days ago
      The real mirage is the utility of median developers
      [-]
      - jeltz 39 days ago
        I think with better processes and training they could be. It is just that right now we do not train them and put them through scrum and other horrible processes. Median developers are bad due to bad management.
      - jennyholzer3 39 days ago
        give them better incentives
  - orwin 39 days ago
    If I can reassure you, if your project is complex enough and involve heavy data manipulation, a 30% improvement using Opus/Gemini 3/codex 5.2 seems like a good result. I think on complex tasks, Opus 4.5 improves my output by around 20-25%.
    And since it's way, way less wrong than sonnet4, it might also improve my whole team velocity.
    I won't lie, AI coding has been a net negative for the 'lazy devs' on my team who don't delves into their own generated code (by 'lazy devs' here I mean the subset of devs who do the work but often don't bother to truly understand the logic behind what they used/did. They are very good coworkers, add velue and are not really lazy, but I don't see another term for that).
  - llmslave2 39 days ago
    I think you articulated perfectly why it's a bubble and why execs are so eager to push it everywhere. It's so alluring, it constantly feels like we're on the verge of something great. No wonder so many people have their brains fried by it.
    [-]
    - anthonypasq 39 days ago
      we're 10 months into agentic coding. Claude code came out in march. I dont understand how you are so unimaginative to think what this might look like in 5 years even with slow progress.
      [-]
      - llmslave2 39 days ago
        It might be genuinely useful in 5 years, my issue is how it's being marketed now. We're 6 months into "AI will be writing 90% of code in three months" among other ridiculous statements.
        [-]
        jeltz 39 days ago
        Agreed. It is very similar to gambling in how it tricks the human mind. I am sure some of this AI technology will prove yo be useful but the breakthrough has been just around the corner since soon after ChatGPT was released.
        jennyholzer3 39 days ago
        I don't mean to be inflammatory but I am not at all convinced that LLMs will be useful for software development in 5 years!
        I think LLMs are very well marketed but I don't think they're very good at writing code and I don't think they've gotten better at it!
        [-]
        llmslave2 39 days ago
        I sort of agree. If anything I feel like they've gotten a bit worse, but the advances in the tooling around them (eg claude code) has masked that slightly.
        I think they are useful as an augmentation, but largely valueless for directly outputting code. Who knows if that will change. It's still made me more productive as a dev despite not oneshotting entire files. It's just not industry-changing, at least yet.
- TeodorDyakov 42 days ago
  I think of it this way. If you dropped Einstein with a time machine two thousand year ago, people would think he is some crazy guy doing scribbles in the sand. No one would ever know how smart he is. The same is with people and advanced AGI like Gemini 3 Pro or Chatgpt 5.2 Pro. We are just dumber than them.
  [-]
  - sponnath 42 days ago
    Why do you think the models are AGI?
    I also like to think that Einstein would be smart enough to explain things from a common point of understanding if you did drop him 2000 years in the past (assuming he also possesses the scientific knowledge humanity accrued in that 2000 year gap). So, your analogy doesn't really make a lot of sense here. I also doubt he'd be able to prove his theories with the technology of the past but that's a different matter.
    If we did have AGI models, they would be able to solve our hardest problems (assuming a generous definition of AGI) even if we didn't immediately understand exactly how they got there. We already have a lot of complex systems that most people don't fully understand but can certainly verify the quality of. The whole "too smart for people to understand that they're too smart" is just a tired trope.
  - clayhacks 42 days ago
    You are certainly dumber than them if you think they are AGI. These models are smart and getting smarter, but they are not AGI.
  - csto12 42 days ago
    You think they have “advanced AGI” and are worried about keeping up with the software industry? There would be be nothing to keep up with at that point.
    To use an analogy, it would be like spending all your time before a battle making sure your knife is sharp when your opponent has a tank.
  - billywhizz 39 days ago
    > We are just dumber than them.
    you are, for sure.
globular-toast 39 days ago
I don't usually post something like this, but this is so fucking stupid. I'm prepared to stand by that. Let's see in a few years if I'm right.
"AI" is literally models trained to make you think it's intelligent. That's it. It's like the ultimate "algorithm" or addiction machine. It's trained to make you think it's amazing and magical and therefore you think it's amazing and magical.
[-]
- viraptor 39 days ago
  This could apply if we looked at questions in vacuum - someone had a conversation and was judging the models based on that. But some of us just use it for work and get good results daily. "Intelligent" is irrelevant; it's "useful". It doesn't matter what feelings I have about it if it saves me 2h of typing from time to time.
  [-]
  - chasd00 39 days ago
    To me, as just another kinda old (I’m 49) swe, the biggest benefit of using an LLM tool is it saves a shit ton of typing. I know what I want and I know when it’s right, just saving me from typing it all out is worth $20 bucks a month.
- kakapo5672 39 days ago
  Recently I needed to summarize about a thousand lengthy documents, and then translate those summaries into Mandarin.
  I spent about a minute composing the prompt for this task, and then went for a cup of coffee. When I got back the task was done. I spot-checked the summaries and they were excellent.
  I thought this was amazing and magical at the time. Am I wrong? Or is it simply the AI making me think this result was amazing and magical?
  [-]
  - llmslave2 39 days ago
    This is an LLM's bread and butter so I would hope it does a decent job.
    You just spot checked it, so how can you be sure how accurate it is. Was it 80% accurate? 90%? 99%? And how does the domain influence the accuracy requirements?
- zmmmmm 39 days ago
  Sure, but there's no reason there can't be a correlation between us "thinking" it's intelligent and it actually being intelligent. What other proxy should we use? I can't think of a scenario where it's actually intelligent but humans don't think it is that has a good practical ending. It's at least necessary even if it isn't sufficient.
- leecommamichael 39 days ago
  It’s trained to (lossy) compress large amounts of data. The system prompts have leaked and it’s just instructed to be helpful, right? I don’t entirely disagree with your sentiment, though. It’s brute force.
- yacthing 39 days ago
  '"AI" is literally models trained to make you think it's intelligent.'
  What's the difference? I try to make people think I'm intelligent all the time.
  [-]
  - bopbopbop7 39 days ago
    Weird self roast but okay.
- heliumtera 39 days ago
  Congratulations on that one!
  Now that you have unlocked this secret, you're cursed forever. They look at the machine and say: hey, look, the machine is just like me! You're left confused for the best part of 3 years and then you start realizing it was true all along...they are..very much similar to the machine. For a moment we were not surprised by how capable the machine was at reasoning. And then it dawned on us, the machine had human level intelligence and cognition from the beginning, just from a slightly different perspective.
- jennyholzer3 39 days ago
  The system prompt may vary but:
  "It's trained to make you think it's amazing and magical and therefore you think it's amazing and magical."
  is the dark pattern underlying the entire LLM hype cycle IMO.
camillomiller 39 days ago
Oh my God I’m so tired of this BS. No, you do not need all that to make a more than sustainable living as a programmer. This narrative is disgusting
6thbit 39 days ago
Sounds to me like Karapathy is in the "valley of despair" of the Dunning-Kruger effect of AI tools.
He knows the tools, he's efficient with them and yet he just now understands how much he's unable to harness at this point that makes him feel left behind.
Looking forward to see what comes out of him climbing that slope.
nen-nomad 39 days ago
Claude Code didn’t make me faster. It changed the calendar. What used to take me months now takes weeks. Work didn't vanished, the friction did.
Two years ago I was a human USB cable: copy, paste, pray. IDE <-> chat window, piece by piece. Now the loop is tighter. The distance is shorter.
There’s still hand-holding. Still judgment. Still cleanup. But the shift is real.
We’ve come a long way. And we’re not done.
[-]
- JDye 39 days ago
  Can't even write a comment without an LLM...
  [-]
  - llmslave2 39 days ago
    It's satire, the models are more subtle in their mannerisms.
  - nen-nomad 39 days ago
    Huh? Good eyes!! I forgot an /s at the end.
dnw 39 days ago
I have been using Copilot, Cursor, then CC for a little more than a year now. I have written code with teams using these tools and I am writing mostly for myself now. My observations have been the following:
1) These tools obviously improved significantly over the past 12 months. They can churn out code that makes sense in the context of the codebase, meaning there is more grounding to the codebase they are working on as opposed to codebases they have been trained on.
2) On the surface they are pretty good at solving known problems. You are not going to make them write well-optimized renderer or an RL algorithm but they can write run-of-the-mill business logic better _and_ faster than I can-- if you optimize for both speed of production and quality.
3) Out of the box, their personality is to just solve the problem in front of them as quickly as possible and move on. This leads them to make suboptimal decisions (e.g. solving a deadlock by sleeping for 2 seconds, CC Opus 4.5 just last night). This personality can be altered with appropriate guidance. For example, a shortcut I use is to append "idiomatic" to my request-- "come up with an idiomatic solution" or "is that the most idiomatic solution we can think of." Similarly when writing tests or reviewing tests I use "intent of the function under test" which makes the model output better solution or code.
4) These models, esp. Opus 4.5 and GPT 5.2, are remarkable bug hunters. I can point at a symptom and they come away with the bug. I then ask them to explain me why the bug happens and I follow the code to see if it's true. I have not come across a bad bug, yet. They can find deadlocks and starvations, you then have to guide them to a good fix (see #3).
5) Code quality is not sufficient to create product quality, but it is often necessary to sustain it. Sustainability window is shorter nowadays. Therefore, more than ever, quality of the code matters. I can see Claude Code slowly degrading in quality every single day--and I use it every single day for many hours. As much as it pains me to say this, compared to Opencode, Amp, and Toad I can feel the "slop" in Claude Code. I would love to study the codebases of these tools overtime to measure their quality--I know it's possible for all but Claude Code.
6) I used to worry I don't have a good mental model of the software I build. Much like journaling, I think there is something to be said about the process of writing/making actually gives you a very precise mental model. However, I have been trying to let that go and use the model as a tool to query and develop the mental model post facto. It's not the same but I think it is going to be the new norm. We need tooling in this space.
7) Despite your own experiences with these tools it is imperative that they be in your toolbox. If you have abstained from them thus far, perhaps best way to get them incorporated is by starting to use them for attending to your toil.
8) You can still handcraft code. There is so much fun, beauty and pleasure it in to deny doing it. Don't expect this to be your job. This is your passion.
[-]
- fallat 39 days ago
  I want to say, that your comment has been the most real, aligned thing I've read in this post's comments. The articulation of what I've also seen and felt is perfect. Whoever else passes by, THIS, is the truth. What dnw has written is the honest-to-god state of things and that it does not rob you of the passion of creating.
- flumpcakes 39 days ago
  > Despite your own experiences with these tools it is imperative that they be in your toolbox.
  Why is it imperative? Whenever I read comments like this I just think the author is cynically drumming up hype because of the looming AI bubble collapse.
  [-]
  - dnw 39 days ago
    Fair question. It is "imperative" for two reasons. The first, despite having rough edges now, I find these tools be actually useful so they are here to stay. The second, I think most developers will use them and make them part of their toolchain. So, if one wants to be in parity with their peers then it stands to reason they adopt these tools as well.
    In terms of bubbles: Bubbles are economic concepts and they will burst but the underlying technology find its market. There are plenty of good open source models and open source projects like OpenCode/Toad that support them. We can use those without contributing (too much) to the bubble.
  - kakapo5672 39 days ago
    There's a financial AI bubble for sure - that's pretty much a mainstream opinion nowadays. But that's an entirely different thing from AI itself bubble-collapsing.
    If you truly believe AI is simply going to collapse and disappear, you are deep in some serious cope and are going to be unpleasantly surprised.
LogicFailsMe 39 days ago
Countdown to his youtube course explaining it all for beginners commences...
[-]
- CamperBob2 39 days ago
  His "youtube course" already exists, and it's absolutely transformational.
  He's working on a more formal educational framework/service of some kind, which will presumably not be free, but what he's already posted is some of the most effective CS pedagogy I've ever encountered (and personally benefited from.)
  [-]
  - LogicFailsMe 39 days ago
    If he publishes something in this space he can just TAKE MY MONEY!
timcobb 39 days ago
whaaaat and this is the guy who coined "vibe-coding"? I am honestly pretty shocked reading this. I must be a fool or an idiot or both because I, for one, feel like suddenly I went from being a 1x developer to a 10x developer. Maybe 10x folks like Karpathy have it the opposite way?
ekropotin 39 days ago
If Karpathy feels behind, imaging how we, regular folks feel
[-]
- furyofantares 39 days ago
  I've worked really hard over the last year at working out how to use these things, and it has more than paid off.
  But I think if I had started learning today instead of a year ago, I'd get up to speed in more like 6 months instead of a year. A lot of stuff I learned a year ago is not really necessary anymore, but furthermore, there's just a lot more information out there about how to use these from people who have been learning it on their own.
  I just don't think people who have ignored it up until now are really that far behind.
  [-]
  - d675 30 days ago
    how has it paid off for you?
kazinator 39 days ago
Guy is a wacko.
arisAlexis 39 days ago
Half of HN are in denial and other half in panic.
wordsaboutcode 41 days ago
i know how he feels :/
[-]
- halfmatthalfcat 39 days ago
  Let go of your AI gods and embrace the abyss. We've survived for decades without them and will survive in spite of them.
Gimpei 39 days ago
I think people need to chill out on this thread. LLMs are neither pure slop nor the end of the programming profession. They are immensely useful tools, particularly for tedious tasks or for quickly getting up to speed on a new API or syntax. They’re great for catching bugs too. Every now and again I’ll give an LLM a prompt and it will knock it out of the park, but that’s exceedingly rare. Most of the time, though, it just allows me to focus on the more interesting parts of my job. In short, for now at least, it is a big productivity booster, not a career ender.
endofreach 39 days ago
i do use LLMs a lot for programming recently. i do not use „agents“ or any other new stuff. while i have always felt behind, i do not feel more behind now, not using agents, mcp etc.
maybe i am too ignorant and don‘t see what i am missing. and i am still writing code and enjoying it.
just the terminology of agents, vibe coding, prompt engineering etc is weirdly offputting to me.
ldng 41 days ago
Yeah. OR. You just ignore the bullshit until the bubble burst. Then we'll see what's left and it will not be what the majority think.
[-]
- tayo42 39 days ago
  There seems to be a lot of churn, like how js was. We can just wait and see what the react of llms ends up being.
- falcor84 40 days ago
  The "bubble" is in the financial investment, not in the technology. AI won't disappear after the bubble bursts, just like the web didn't disappear after 2000. If anything, bursting the financial bubble will most likely encourage researchers to experiment more, trying a larger range of cheaper approaches, and do more fundamental engineering rather than just scaling.
  AI is here to stay, and the only thing that can stop it at this stage is a Butlerian jihad.
  [-]
  - design2203 39 days ago
    AI has been here long before LLM’s… also I dislike the people seemingly tying the two terms together as one.
  - lo_zamoyski 39 days ago
    Borg logic consists of framing matters of choice as "inevitable". As long as those with power convince everyone that technological implementation is "inevitable", people will passively accept their self-serving and destructive technological mastery of the world.
    The framing allows the rest of us to get ourselves off the hook. "We didn't have a choice! It was INEVITABLE!"
    And so, we have chosen.
    [-]
    - falcor84 39 days ago
      But history shows that it is inevitable. Can you give me an example of a single useful technology that humans ever stopped developing because of its negative externalities?
      > "We didn't have a choice! It was INEVITABLE!"
      There is no "we". You can call it the tragedy of the commons, or Moloch, or whatever you want, but I don't see how you can convince every single developer and financial sponsor on the planet to stop using and developing this (clearly very useful) tech. And as long as you can't, it's socially inevitable.
      If you want a practice run, see if you can stop everyone in the world from smoking tobacco, which is so much more clearly detrimental. If you manage that, you might have a small chance at stopping implementation of AI.
      [-]
      - andrekandre 39 days ago
        > see if you can stop everyone in the world from smoking tobacco
        this is a logical fallacy i think; nobody needs to stop tobacco full-stop, but we have been extremely successful at making it less and less incentivized/used over time, which is the goal...
        [1] https://www.lung.org/research/trends-in-lung-disease/tobacco...
  - ldng 40 days ago
    I maintain, the web today is not what people though it would be in 1998. The tech has it's uses, it's just not what snake oil sellers are making it to be. And talking about Butlerian jihad is borderline snake oil selling.
    [-]
    - falcor84 40 days ago
      Interesting. What particular 1998 claims do you have in mind that were not (at least approximately) fulfilled?
  - wiseowise 39 days ago
    Not even Butlerian Jihad will stop the current progress at this point.
    [-]
    - lo_zamoyski 39 days ago
      Resistance if futile, eh?
boesboes 39 days ago
Bullshit
oakpond 42 days ago
> There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering.
Slop-oriented programming
alexcos 41 days ago
"I've never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last ~year and a failure to claim the boost feels decidedly like skill issue. There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering. Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession. Roll up your sleeves to not fall behind."
sora2video 39 days ago
[dead]
breve 42 days ago
[flagged]
[-]
- fooblaster 42 days ago
  Its clear from listening to podcasts/interviews, he does not want to say anything to get on elons bad side. Interviewers appear to also not be eager to broach the subject.
  [-]
  - breve 42 days ago
    If indeed he doesn't have the heart for basic honesty then why should anyone listen to him about anything?
    This is not a high bar. This is not some impossible moral standard to be held to.
    This really is an easy one.
    [-]
    - grim_io 42 days ago
      Being honest about self-driving AI gets you sued by the richest guy on earth.
      [-]
      - breve 41 days ago
        Everything has a cost. Fear of doing the right thing isn't worth it.
        [-]
        grim_io 41 days ago
        That's a very cute power rangers way of saying it, but c'mon, it's easy to say when it's not you who is the target of a limitless revenge machine.
        [-]
        breve 40 days ago
        You can embrace a nihilistic weakness or you can do the right thing. It's not complex.
        If Karpathy is genuinely compromised to the point where he can no longer do the right thing then no one should listen to him.
      - mrkeen 39 days ago
        Honesty is not the same thing as justified dishonesty.
        If you can paint him as a rational agent for lying, you can still be a rational agent and ignore his lies.
thomasfromcdnjs 41 days ago
I have been telling everybody I know over the Christmas break that I have been coding from around 10-36 years of age, as a career and always in my spare time as a hobby. I have a lacklustre computer science knowledge and never worked at the scale of FANG etc but am still rather confident in my understanding of code and the tech scene in general. I've been telling people I haven't "coded" for almost 6 months now, I only interface with agentic setups and only open my IDE to make copy and config changes.
I understand we are all in different camps for a multitude of reasons;
- The jouissance of rote coding and abstraction
- The tree of knowledge specifically in programming, and which branches and nodes we each currently sit at in our understanding
- Technical paradigms that humans may have argued about have now shifted to obvious answers for agentic harnesses (think something like TDD, I for one barely used that as a style because I've mostly worked in startups building apps and found the cost of my labour not worth it, but agentic harnesse loops absolutely excel at it)
- The geography and size of the markets we work in
- The complexity of the subject matter / domain expertise
- The cost prohibitive nature of token based programming (not everyone can afford it, and the big fish seemingly have quite the advantage going fourth)
- Agentic coding has proven it can build UI's very easily, and depending on experience, it can build a very very many things easily. it excels in having feedback loops such as linting or simple javascript errors, which are observability problems in my opinion. Once it can do full stack observability (APM, system, network), it's ability to reason and correct problems on the fly for any complex system seems overly easy from my purvue.
- At the human nature level, some individuals prefer to think in 0's and 1's, some in words, some inbetween, and so on, what type of communication do agentic setups prefer?
With some of that above intuition that is easily up for debate, I've decided to lean 100% into agentic coding, I think it will be absolutely everywhere and obviously with humans in the loop but I don't think humans will need to review the pull requests. I am personally treating it as an existential threat to my career after having seen enough of what it's capable of. (with some imagination and a bit of a gambling spirit, as us mere mortals surely can't predict the future)
With my gambit, I'm not choosing to exit the tech scene and instead optimistically investing my mental prowess into figuring out where "humans in the loop" will be positioned. Currently I'm looking into CI level tooling, the known being code quality, and all the various forms of software testing paradigms. The emerging evals in my mind will keep evolving and beyond testing our ideas of model intelligence and chat bot responses will do a lot more.
---
A more practical rant: If you are building a recommendation engine for A and B, the engine could have X amount of modules that return a score which when all combined make up the final decision between A and B. Forgive me but let's just use dating as an example. A product manager would say we need a new module to calculate relevance between A and B based off their food preferences. An agentic harness can easily code that module and create the tests for it. The product manager could ask an LLM to make a list of 1000 reasons why two people might be suitable for dating. The agent could easily go away and code and test all those modules and probably maintain technical consistency but drift from the companies philosophical business model. I am looking into building "semantic linting" for codebases, how can the agent maintain the code so it aligns with the company's business model. And if for whatever reason those 1000 modules need to be refactored, how can the agent maintain the code so it aligns with the company's business model. Essentially trying to make a feedback loop between the companies needs and the code itself. To stop the agent and the business from drifting in either directions, and allowing for automatic feedback loops for the agent to fix them. In short, I think there will be new tools invented that us human's will be mastering as to Karpathy's point.
[-]
- anothereng 39 days ago
  interesting how can I go into building Agents? I have the kiro IDE for a project but how can I make sure what they're doing is correct? Right now i'm just vibecoding or using the more detailed requirements path but I haven't used coding Agents because I actually don't get how does the feedback loop work with them
lo_zamoyski 39 days ago
If you want to chase the mob off the cliff, go ahead. Insanity and stupidity aren't sound life strategies, though. They're a sign you have lost the plot.