Two kinds of AI users are emerging. The gap between them is astonishing

(martinalderson.com)

43 points | by martinald 2 hours ago

15 comments

danpalmer 1 hour ago
I've noticed a huge gap between AI use on greenfield projects and brownfield projects. The first day of working on a greenfield project I can accomplish a week of work. But the second day I can accomplish a few days of work. By the end of the first week I'm getting a 20% productivity gain.
I think AI is just allowing everyone to speed-run the innovator's dilemma. Anyone can create a small version of anything, while big orgs will struggle to move quickly as before.
The interesting bit is going to be whether we see AI being used in maturing those small systems into big complex ones that account for the edge cases, meet all the requirements, scale as needed, etc. That's hard for humans to do, and particularly while still moving. I've not see any of this from AI yet outside of either a) very directed small changes to large complex systems, or b) plugins/extensions/etc along a well define set of rails.
[-]
- data-ottawa 38 minutes ago
  It’s fantastic to be able to prototype small to medium complexity projects, figure what architects work and don’t, then build on a stable foundation.
  That’s what I’ve been doing lately, and it really helps get a clean architecture at the end.
  [-]
  - johnrob 19 minutes ago
    I’ve done this in pure Python for a long time. Single file prototype that can mostly function from the command line. The process helps me understand all the sub problems and how they relate to each other. Best example is when you realize behaviors X, Y, and Z have so much in common that it makes sense to have a single component that takes a parameter to specify which behavior to perform. It’s possible that already practicing this is why I feel slightly “meh” compared to others regarding GenAI.
smuhakg 53 minutes ago
> On one hand, you have Microsoft's (awful) Copilot integration for Excel (in fairness, the Gemini integration in Google Sheets is also bad). So you can imagine financial directors trying to use it and it making a complete mess of the most simple tasks and never touching it again.
Microsoft has spent 30 years designing the most contrived XML-based format for Excel/Word/Powerpoint documents, so that it cannot be parsed except by very complicated bespoke applications with hundreds of developers involved.
Now, it's impossible to export any of those documents into plain text that an LLM can understand, and Microsoft Copilot literally doesn't work no matter how much money they throw at it. My company is now migrating Word documents to Markdown because they're seeing how powerful AI is.
This is karmic justice imo.
[-]
- QuantumGood 39 minutes ago
  Tim Berners-Lee thought pages would become machine-readable long ago, with "obvious" benefits, and that idea partly drove XML, RDF and HTML 5. Now the benefit of doing so seems even bigger (but are they?), and the time spent making existing documents AI readable seems to keep growing.
- martinald 45 minutes ago
  Totally agree, though ironically Claude code works way better with Excel than I expected.
  I even tried telling Copilot to convert each sheet to a CSV on one attempt THEN do calculations. It just ignored it and failed miserably, ironically outputting me a list of files that it should have made, along with the broken python script. I found this very amusing.
defrost 1 hour ago
The "upside" description:
```
  On the other you have a non-technical executive who's got his head round Claude Code and can run e.g. Python locally.

  I helped one recently almost one-shot converting a 30 sheet mind numbingly complicated Excel financial model to Python with Claude Code.

  Once the model is in Python, you effectively have a data science team in your pocket with Claude Code. You can easily run Monte Carlo simulations, pull external data sources as inputs, build web dashboards and have Claude Code work with you to really integrate weaknesses in your model (or business). It's a pretty magical experience watching someone realise they have so much power at their fingertips, without having to grind away for hours/days in Excel.
```
almost makes me physically sick.
I've a reasonably intense math background corrupted by application to geophysics and implementing real world numerical applications.
To be fair, this statement alone:
* 30 sheet mind numbingly complicated Excel financial model
makes my skin crawl and invokes a flight reflex.
Still, I'll concede that a Claude Code conversion to Python of a 30 sheet Excel financial model is unlikely to be significantly worse than the original.
[-]
- majormajor 1 hour ago
  One of the dirty secrets of a lot of these "code adjacent" areas is that they have very little testing.
  If a data science team modeled something incorrectly in their simulation, who's gonna catch it? Usually nobody. At least not until it's too late. Will you say "this doesn't look plausible" about the output? Or maybe you'll be too worried about getting chided for "not being data driven" enough.
  If an exec tells an intern or temp to vibecode that thing instead, then you definitely won't have any checkpoints in the process to make sure the human-language prompt describing process was properly turned into the right simulation. But unlike in coding, you don't have a user-facing product that someone can click around in, or send requests to, and verify. Is there a test suite for the giant excel doc? I'm assuming no, maybe I'm wrong.
  It feels like it's going to be very hard for anyone working in areas with less black-and-white verifiability or correctness like that sort of financial modeling.
  [-]
  - tharkun__ 52 minutes ago
    This is a pet peeve of mine at work.
    Any and I mean any statistic someone throws at me I will try and dig in. And if I'm able to, I will usually find that something is very wrong somewhere. As in, the underlying data is usually just wrong, invalidating the whole thing or the data is reasonably sound but the person doing the analysis is making incorrect assumptions about parts of the data and then drawing incorrect conclusions.
    [-]
    - aschla 46 minutes ago
      It seems to be an ever-present trait of modern business. There is no rigor, probably partly because most business professionals have never learned how to properly approach and analyze data.
      Can't tell you how many times I've seen product managers making decisions based on a few hundred analytics events, trying to glean insight where there is none.
- decimalenough 1 hour ago
  I'm almost certain it will be significantly worse.
  The Excel sheet will have been tuned over the years by people who knew exactly what it was doing and fixed countless bugs along the way.
  The Claude Code copy will be a simulacrum that may behave the same way with some inputs, but is likely to get many of edge cases wrong, and, when you're talking about 30 sheets of Excel, there will be many, many of these sharp edges.
  [-]
  - defrost 59 minutes ago
    I won't disagree - I suffered from insufficient damning praise in my last sentence above.
    IMHO, earned through years of bleeding eyeballs, the first will be riddled with subtle edge cases curiously patched and fettled such that it'll limp through to the desired goal .. mostly.
    The automated AI assisted transcoding will be ... interesting.
- ChrisMarshallNY 57 minutes ago
  Obligatory xkcd: https://xkcd.com/1667/
decimalenough 1 hour ago
> I helped one recently almost one-shot[3] converting a 30 sheet mind numbingly complicated Excel financial model to Python with Claude Code.
I'm sure Claude Code will happily one-shot that conversion. It's also virtually guaranteed to have messed up vital parts of the original logic in the process.
[-]
- linsomniac 1 hour ago
  It depends on how easily testable the Excel is. If Claude has the ability to run both the Excel and the Python with different inputs, and check the outputs, it's stunningly likely to be able to one-shot it.
  [-]
  - AlotOfReading 55 minutes ago
    Something being simultaneously described as a "30 sheet, mind-numbingly complex Excel model" and "testable" seems somewhat unlikely, even before we get into whether Claude will be able to test such a thing before it runs into context length issues. I've seen Claude hallucinate running test suites before.
    [-]
    - martinald 48 minutes ago
      It compacted at least twice but continued with no real issues.
      Anyway, please try it if you find it unbelievable. I didn't expect it to work FWIW like it did. Opus 4.5 is pretty amazing at long running tasks like this.
      [-]
      - moregrist 33 minutes ago
        I think the skepticism here is that without tests or a _lot_ of manual QA how would you know that it did it correctly?
        Maybe you did one or the other , but “nearly one-shotted” doesn’t tend to mean that.
        Claude Code more than occasionally likes to make weird assumptions, and it’s well known that it hallucinates quite a bit more near the context length, and that compaction only partially helps this issue.
      - stavros 24 minutes ago
        I generally agree with you, but I tried to get it to modernize a fairly old SaaS codebase, and it couldn't. It had all the code right there, all it had to do was change a few lines, upgrade a few libraries, etc, but it kept getting lots of things wrong. The HTML was wrong, the CSS was completely missing, basic views wouldn't work, things like that.
        I have no idea why it had so much trouble with this generally easy task. Bizarre.
  - martinald 1 hour ago
    That's exactly what it did (author here).
    [-]
    - majormajor 54 minutes ago
      I'm having trouble reconciling "30 sheet mind numbingly complicated Excel financial model" and "Two or three prompts got it there, using plan mode to figure out the structure of the Excel sheet, then prompting to implement it. It even added unit tests to the Python model itself, which I was impressed with!"
      "1 or 2 plan mode prompts" to fully describe a 30-sheet complicated doc suggests a massively higher level of granularity than Opus initial plans on existing codebases give me or a less-than-expected level of Excel craziness.
      And the tooling harnesses have been telling the models to add testing to things they make for months now, so why's that impressive or suprising?
      [-]
      - martinald 50 minutes ago
        No it didn't make a giant plan of every detail. It made a plan of the core concepts and then when it was in implementation mode it kept checking the excel file to get more info. It took around ~30 mins in implementation mode to build it.
        I was impressed because the prompt didn't ask it to do that. It doesn't normally add tests for me without asking, YMMV.
        [-]
        majormajor 47 minutes ago
        Ah, I see.
        Did it build a test suite for the Excel side? A fuzzer or such?
        It's the cross-concern interactions that still get me.
        80% of what I think about these days when writing software is how to test more exhaustively without build times being absolute shit (and not necessarily actually being exhaustive anyway).
- Spivak 1 hour ago
  Doesn't it help you sleep at night that your 401k might be managed by analysts #yoloing their financial modeling tools with an LLM?
  [-]
  - DaedalusII 38 minutes ago
    having worked in large financial institutions, this would be a step improvement
    the largest independent derivatives broker in australia collapsed after it was discovered the board were using astrology and magicians to gamble with all the clients money
    https://www.abc.net.au/news/2016-09-16/stockbroker-used-psyc...
simmerup 1 hour ago
Terrifying that people are creating financial models with AI when they don’t have the skills to verify the model does what they expect
[-]
- martinald 1 hour ago
  They have an excel sheet next to it - they can test it against that. Plus they can ask questions if something seems off and have it explain the code.
  [-]
  - AlotOfReading 1 hour ago
    I'm not sure being able to verify that it's vaguely correct really solves the issue. Consider how many edge cases inhabit a "30 sheet, mind-numbingly complicated" Excel document. Verifying equivalence sounds nontrivial, to put it mildly.
  - lmm 1 hour ago
    > They have an excel sheet next to it - they can test it against that.
    It used to be that we'd fix the copy-paste bugs in the excel sheet when we converted it to a proper model, good to know that we'll now preserve them forever.
  - karlgkk 1 hour ago
    [flagged]
    [-]
    - yomismoaqui 1 hour ago
      You would be surprised at the volume of money made by businesses supported by Excel.
      [-]
      - martinald 1 hour ago
        Yes. I suspect there are thousands of Excel files that "process" >$1bn/yr out there.
- nebula8804 1 hour ago
  All we need is one major crash caused by AI to scare the capital owners. Then maybe us white collar workers can breath a bit for at least another few more years(maybe a decade+).
- myfakebadcode 59 minutes ago
  I’m trying to learn rust coming from python (for fun). I use various LLM for python and see it stumble.
  It is a beautiful experience to realize wtf you don’t know and how far over their skis so many will get trusting AI. The idea of deploying a rust project at my level of ability with an AI at the helm is is terrifying.
- derrida 44 minutes ago
  Business as usual.
- mkoubaa 1 hour ago
  It's not terrifying at all, some shops will fail and some will succeed and in the aggregate it'll be no different for the rest of us
- fatheranton 1 hour ago
  [dead]
s-lambert 1 hour ago
I don't see a divergence, from what I can tell a lot of people have only just started using agents in the past 3-4 months when they got good enough that it was hard to say otherwise. Then there's stuff like MCP, which never seemed good and was entirely driven by people who talked more about it than used it. There also used to be stuff like langchain or vector databases that nobody talks about anymore, maybe they're still used but they're not trendy anymore.
It seems way too soon to really narrow down any kind of trends after a few months. Most people aren't breathlessly following the next twitter trend, give it at least a year. Nobody is really going to be left behind if they pick up agents now instead of 3 months ago.
[-]
- neom 32 minutes ago
  Not sure how much falling behind there is even going to be, I'm an old school linux type with D- programming skills, yet getting going building things has been ridiculously easy. The swarms thing makes is so fast. I've churned 2 small but tested apps out in 2 weekends just chatting with claude code, the only thing I had to do was configure the servers.
wrs 44 minutes ago
Some minor editing to how this would have been written in the mid-1980s:
“The real leaps are being made organically by employees, not from a top down [desktop PC] strategy. Where I see the real productivity gains are small teams deciding to try and build a [Lotus 123] assisted workflow for a process, and as they are the ones that know that process inside out they can get very good results - unlike a [mainframe] software engineering team who have absolutely zero experience doing the process that they are helping automate.”
The embedded “power users” show the way, then the CIO-friendly packaged software follows much later.
ed_mercer 1 hour ago
> Microsoft itself is rolling out Claude Code to internal teams
Seems like Nadella is having his Baller moment
[-]
- running101 56 minutes ago
  Code red moment
- fdsf2 1 hour ago
  Nothing but ego frankly. Apple had no problem settling for a small market share back in the day... look where they are now. It didnt come from make-believe and fantasy scenarios of the future based on an unpredictable technology.
with 43 minutes ago
> The bifurcation is real and seems to be, if anything, speeding up dramatically. I don't think there's ever been a time in history where a tiny team can outcompete a company one thousand times its size so easily.
Slightly overstated. Tiny teams aren't outcompeting because of AI, they're outcompeting because they aren't bogged down by decades of technical debt and bureaucracy. At Amazon, it will take you months of design, approvals, and implementation to ship a small feature. A one-man startup can just ship it. There is still a real question that has to be answered: how do you safely let your company ship AI-generated code at scale without causing catastrophic failures? Nobody has solved this yet.
DavidPiper 30 minutes ago
> To really underline this, Microsoft itself is rolling out Claude Code to internal teams, despite (obviously) having access to Copilot at near zero cost, and significant ownership of OpenAI. I think this sums up quite how far behind they are
I think it sums up how thoroughly they've been disrupted, at least for coding AIs (independent of like-for-like quality concerns rightly mentioned elsewhere in this thread re: Excel/Python).
I understand ChatGPT can do like a million other things, but so can Claude. Microsoft deliberately using competitors internally is the thing that their customers should pay attention to. Time to transform "Nobody gets fired for buying Microsoft" into "Nobody gets fired for buying what Microsoft buy", for those inclined.
drsalt 46 minutes ago
what is the source data? the author says they've seen "far more non-technical people than I'd expect using Claude Code in terminal" so like, 3 people? who are these people?
[-]
- chadcmulligan 26 minutes ago
  It's always the same, for a field that has so much money the amount of facts or examples is strange. I can't help but think there's a lot of something going on. Business guys converting excel into python scripts doesn't seem like a good idea to me, the room for error is huge.
  I would love to see just one example from one of these companies of - Here's an application we want to build, here's the prompts we use, here's the output, here's the application. You'd think this would exist.
Havoc 1 hour ago
The copilot button in excel at my work can’t access the excel file of the window it’s in. As in “what’s in cell A1” and it says I can’t read this file. Not even sure what the point is then frankly.
I’m happily vibe coding at work but yeah article is right. MS has enterprise market share by default not by merit. Stunning contrast between what’s possible and what’s happening in big corp
[-]
- bwat49 48 minutes ago
  yeah I actually use AI a lot, but copilot is... useless. When microsoft adds copilot to their various apps they don't seem to put any thought/effort behind it beyond sticking a copilot button somewhere.
  And if the copilot button does nothing but open a chat window without any real integration with the app, what the hell is the point of that when there's already a copilot button in the windows taskbar?
- cmrdporcupine 1 hour ago
  Meanwhile the people I know who work at Microsoft say there's a constant whip-cracking to connect everything they're doing to "AI" and prove that's what they're doing.
superkuh 1 hour ago
The argument seems to be that having a corporation restrict your ability to present arbitrary text directly to the model and only being able to go through their abstract interface which will integrate your text into theirs (hopefully) is more productive than fully controlling the input text to a model. I don't think that's true generally. I think it can be true when you're talking about non-technical users like the article is.
[-]
- majormajor 1 hour ago
  The use of specialization of interfaces is apparent if you compare Photoshop with Gemini Pro/Nano Banana for targeted image editing.
  I can select exactly where I want changes and have targeted element removal in Photoshop. If I submit the image and try to describe my desired changes textually, I get less easily-controllable output. (And I might still get scrambled text, for instance, in parts of the image that it didn't even need to touch.)
  I think this sort of task-specific specialization will have a long future, hard to imagine pure-text once again being the dominant information transfer method for 90% of the things we do with computers after 40 years of building specialized non-text interfaces.
  [-]
  - duskwuff 54 minutes ago
    One reasonable niche application I've seen of image models is in real estate, as a way to produce "staged" photos of houses without shipping in a bunch of furniture for a photo shoot (and/or removing a current tenant's furniture for a clean photo). It has to be used carefully to avoid misrepresenting the property, of course, but it's a decent way of avoiding what is otherwise a fairly toilsome and wasteful process.
    [-]
    - majormajor 42 minutes ago
      This sort of thing (not for real estate, but for "what would this furniture actually look like in this room) is definitely somewhere the open-ended interface is fantastic vs targeted-remove in Photoshop (but could also easily be integrated into a Photoshop-like tool to let me be more specific about placement and such).
      I was a bit surprised by how it still resulted in gibberish text on posters in the background in an unaffected part of the image that at first glance didn't change at all. So even just the "masking" ability of like "anything outside of this range should not be touched" of a GUI would be a godsend.
  - fdsf2 1 hour ago
    It behooves me that Gemini et al dont have these standard video editing tools. Do the engineers seriously think prompting by text is the way people want videos to be generated? Nope. People want to customise. E.g. Check out capcut in the context of social media.
    Ive been trying to create a quick and dirty marketing promo via an LLM to visualise how a product will fit into the world of people - it is incredibly painful to 'hope and pray' that by refining the prompt via text you can make slight adjustments come through.
    The models are good enough if you are half-decent at prompting and have some patience. But given the amount invested, I would argue they are pretty disappointing. Ive had to chunk the marketing promo into almost a frame-by-frame play to make it somewhat work.
    [-]
    - suprstarrd 1 hour ago
      Speaking as someone who doesn't like the idea of AI art so take my words with a grain of salt, but my theory is that this input method exclusivity is intentional on their part, for exactly the reason you want the change. If you only let people making AI art communicate what they want through text or reference attachments (the latter of which they usually won't have), then they have to spend time figuring out how to put it into words. It IS painful to ask for those refinements, because any human would clearly understands it. In the end, those people get to say that they spent hours, days, or weeks refining "their prompt" to get a consistent and somewhat-okay looking image; the engineers get to train their AI to better understand the context of what someone is saying; and all the while the company gets to further legitimize a false art form.