I used Claude Code to get a second opinion on my MRI

(antoine.fi)

469 points | by engmarketer 19 hours ago

100 comments

AceJohnny2 18 hours ago
> There's something incredibly peaceful about being in the hands of an expert you trust. [...] AI can absolutely shatter that feeling in an uncomfortable way [...] but I don't know if I can fully trust AI either.
This really is key. We know we can't trust the AI, but at the same time we're also more comfortable asking the AI for clarifications or confronting it. Not having a time-bound appointment or paying by the hour helps a lot. But even then, more information doesn't necessarily help!
I once brought my 11-year-old car, a Civic with 150k miles, to multiple garages. I figured I'd play the "second opinion" game to correlate what the garages recommended to decide on what needed to be done...
I got 3 completely unrelated recommendations, including one that I knew was invalid! I felt worse off than when I started!
The solution to uncertain information isn't more information, which the AI can certainly provide, it's better information, and AI cannot currently provide that.
[-]
- Aurornis 17 hours ago
  I have multiple LLM subscriptions at any given time, plus an array of local models.
  When I ask a question outside of my domain of expertise I like to ask all of the LLMs I have access to. I also create separate sessions and ask the same question multiple ways.
  It’s revealing to see how many different and contradictory answers I get, most of which are presented confidently.
  The last time I ran a medical question through Claude I couldn’t even get consistent answers between sessions.
  It’s also scary how easily you can lead each LLM to the answer you have in mind. When I would start asking questions about different options that other LLMs had presented, each session would drift toward that explanation.
  [-]
  - marcus_holmes 9 hours ago
    In my day job we tried creating a credit assessor tool using LLM as the credit assessor.
    It did great, generated a report on the assessed business that was incredibly detailed and plausible.
    Then I started running tests and getting into the details, and found that if you ran the same report on the same data, it generated completely different, still very plausible, results. I could run the same source data through the assessment process 10 times and get 10 very different results. We had to can the project and go a different route.
    LLMs are designed to produce plausible results, not factual results. We can fix this when using them for software dev by using linters and tests (though we've all had the experience where the LLM invents an API endpoint). I would not trust raw LLM output in any situation where that kind of testing and verification capability isn't present.
    [-]
    - Suppafly 5 hours ago
      What's crazy is that there are ton of businesses building processes around LLMs that haven't done this exercise and fully believe the LLM is giving them accurate data.
    - xbmcuser 5 hours ago
      Yup I use llm to write scripts for me to process data I don't ask the llm to process the data themselves. Even when I wrote something for my day trading I used llm write scripts that do all the processing and predict price movement from that the more data is pre processed the more all the llm come up with similar trades.
    - adamddev1 6 hours ago
      Linters and tests help of course, but they cannot "fix" the problem since tests cannot prove the absence of bugs.
      [-]
      - marcus_holmes 6 hours ago
        agree, and I think we'll see more use of formal methods with LLMs for this reason
        [-]
        ryukoposting 6 hours ago
        At a certain point it just feels like we're reinventing the concept of programming languages from first principles.
        [-]
        ncruces 3 hours ago
        I'm using this new programming language: it's called LLM prompting, and everything is undefined behavior.
        marcus_holmes 3 hours ago
        Hardly the first time we've done this - we had to do it with compilers too
  - dirkt 7 hours ago
    What happened to VERIFYING an answer? Does nobody do that anymore?
    When I ask an LLM, I trace the sources, and see if they make sense.
    More often than not the sources don't actually say anything about the topic in particular...
    > It’s also scary how easily you can lead each LLM to the answer you have in mind.
    Exactly. Which is why "treat an LLM like a human expert who can answer your question" doesn't work. It's more like a human bullshitter who makes up convincing looking answers, and tries to please you. If the answers have actually some grounding in the training material, that's useful as some kind of holistic google, but often it's not.
    [-]
    - palata 4 hours ago
      > What happened to VERIFYING an answer? Does nobody do that anymore?
      The problem with medical advice is that you may not be competent to verify the answer, right?
      I agree that asking 5 LLMs to vote and trusting the answer is totally the wrong approach, of course. But LLMs (and traditional material) can help getting more informed. For instance, instead of going to your doctor with the LLM diagnosis and trying to convince the doctor that the LLM is right, you can try to build your own understanding of the problem and go ask the doctor to explain to you what you understood correctly and what you misunderstood.
      If you have some understanding, it's harder for a specialist to bullshit you. But you need your own critical thinking and you need to put effort into actually learning something, blindly trusting and repeating what LLMs say doesn't help.
    - prmph 3 hours ago
      I've also noticed the opposite problem: Sometimes the LLM, when asked a detailed question (probably with some lead-in), pushes back in a way that betrays that they fell back to general tropes without really considering the nuances of your specific context.
      This happens many times, and I usually have to lead the LLM through a chain of reasoning to prove to it that its objection, through generally sound, do not apply to my specific situation.
      Someone not as well versed in the subject matter would think the LLM found a smoking gun (which they love to do), and be led on a wild goose chase.
      [-]
      - wizzwizz4 1 hour ago
        > I usually have to lead the LLM through a chain of reasoning to prove to it
        What's the point of doing this?
        [-]
        prmph 49 minutes ago
        So that hopefully we can go further in the discussion without it having to repeatedly bring up those (discredited) objections.
        But it does forget, and I'd have to prime it again for another session.
    - mathieuh 4 hours ago
      As you say, often you check up on the LLM's "reasoning" and it doesn't follow at all, or you can easily get it to contradict itself with just as much certainty as it had about its previous convictions.
      It is very scary to me that people are entrusting potentially life-altering decisions to these things.
    - otabdeveloper4 6 hours ago
      > When I ask an LLM, I trace the sources, and see if they make sense.
      Professional tip: you can cut out the LLM middleman here and save a lot of time and money.
      [-]
      - microgpt 5 hours ago
        What would you use then? Google Search, which is just a shittier LLM?
        [-]
  - base698 1 hour ago
    My step mom was having debilitating pain. A year of going to doctors and no one was able to find a cause. I scanned her discharge paper work which had her prescriptions on it and gave it to Claude. It identified a prescription that had that exact side effect. They later confronted her primary care that concurred and took her off it.
    A friend of mine's wife recently passed. They were chasing a suspected heart defect for over a year. She had been intermittently fainting. At about the year mark they decided to scope her digestive track. They found bleeding ulcers from cancer that was all over her body. I input her fainting symptoms into Claude and gastro impact was number two suspected after heart issues.
    I have a few of other cases it's helped with. I'm not sure it could do worse than my own experience with the medical system. This is doubly true in places that lack any sort of medical care.
  - palata 4 hours ago
    > It’s also scary how easily you can lead each LLM to the answer you have in mind.
    Scary in this context of course, but I find that it is an interesting thought for coding: it suggests that maybe, a developer who knows what they are doing will end up leading the LLM to coding something that make more sense than a developer who doesn't know and just vibe-codes blindly.
    Sounds pretty obvious, but I wanted to say it.
    [-]
    - ncruces 3 hours ago
      And all it takes is not blindingly accepting the first thing it spews if you suspect there's a better answer (and are in a position to evaluate that better answer).
  - Esophagus4 16 hours ago
    Have you ever let the LLMs “discuss” with each other to see if that would give better answers?
    You might end up with the answer from the most persuasive LLM, but you might also end up with better results.
    Wonder if there is a paper out there on this.
    [-]
    - scheme271 15 hours ago
      The problem is how do you know whether the answer is just the most persuasive or actually the most accurate one? It's hard to figure this out without domain knowledge.
      [-]
      - sizzle 8 hours ago
        Take the output to a Radiologist and verify the veracity of the statements.
        [-]
        RussianCow 6 hours ago
        At that point, cut out the LLM and just see the radiologist.
        tuxguy 5 hours ago
        there is often discordance between radiologists(& doctors in general) when reading the same scan(same case vignette) as well !
        [-]
        yen223 4 hours ago
        Do people here not realise that "second opinions" are a thing because humans disagree with each other when presented with the same case all the time? It's not just an LLM thing!
        marcta 6 hours ago
        Why should a radiologist have to debunk AI slop? They have enough to do already. That's the same mentality that is frustrating open-source repositories with sloppy pull requests, and saying "here, sort this out for me".
        [-]
        dumb1224 3 hours ago
        Depending on the disease, even in cancer there's myeloma which may cause bone metastasis in many parts of the body with very focal lesions. Radiologists can't assess each and an every one of them, or even to find them all. So AI can definitely help in these scenarios.
        [-]
        wizzwizz4 1 hour ago
        And that AI will not be fancy autocomplete: it will be some kind of image classifier that is not trained on Reddit.
      - Esophagus4 9 hours ago
        I dunno, I could see it working.
        I do something similar with reviewing code: I have one agent write the code and another reviews it, then they go back and forth for a bit improving the code. Seems to yield better results than one agent alone.
        Seems like a similar principle.
        [-]
        scheme271 9 hours ago
        The difference is that in the code situation, you can run unit tests on the code, compile it, etc. Unless your LLMs are ordering diagnostics and reviewing the results, there is no further information that the LLMs have on the situation. Having a second LLM review the first is counterproductive, if the 2nd LLM is better, why not use it directly? If not, then what prevents it from sending the first on some incorrect tangent?
        [-]
        RussianCow 6 hours ago
        Also, there are multiple "correct" ways to code something, so imperfect code that solves the problem is still useful. A medical diagnosis is either correct or incorrect.
      - XorNot 13 hours ago
        Worse is that LLMs are trained to be persuasive by default. The "you're absolutely right..." stereotype is because these things are A/B tested on response quality and we know from studies people reliably rate vibes better then anything else - e.g. while the quality of hospital accomodations likely has some impact on patient outcomes, the view and decor of the room certainly did not fundamentally change the quality of the care provided but it is the largest determinant in how well people rate that care.
    - mncharity 7 hours ago
      With direct discussion, the same tendency to harmonize towards groupthink applies.
      Aside from the statelessness GP mentioned, one can insert anti-conciliatory intermediation. "I saw a random claim go by, but something about it seems not quite right. What am I missing? They said: [...]." Weaponizing the bias, and orchestrating the discourse from the harness.
    - cadamsdotcom 15 hours ago
      The problem with trying to write a paper is the results depend on RNG.
      [-]
      - red75prime 6 hours ago
        Run it with temperature 0 if you want to minimize randomness. Sampling from a probability distribution is not a problem by itself. The problem is when the probability distribution prioritizes wrong answers.
      - NonHyloMorph 14 hours ago
        That doesn't make it differrnt from any other problem measured by statistical significance in averaged over a big enough series of comparisons, no?
- jdblair 7 hours ago
  The best mechanic I ever had kept my ‘98 Subaru going past 200k miles. Once during a repair I asked him to do an inspection and tell me if there was anything else I should replace. He told me not to do that, and that any mechanic would always find something, but not necessarily the next thing to break.
  He said it better using an expression I hadn’t heard before or since, something like “don’t go looking for goats when your herd is already with you.”
  [-]
  - dumb1224 3 hours ago
    Exactly. Old parts of the system will be working if you leave them undisturbed. Mechanics have very good intuitions of this sort of thing.
    I read about before there's proper engineering / physics theory about this too, it's like a car as a machine is a linear/smooth physics system with multiple weaknesses. Overtime longtime period of running many places might weaken but it still evolves into a slightly different smooth system, until you introduce a replacement which cause a mis-match of impedance or something like that.
  - tass 2 hours ago
    Maintenance-induced failures are what it’s called with small aircraft.
    You’ll do something to prevent a failure (like, replace an old but functional alternator) but cause an oil leak or engine vibrations because you had to remove the propeller to complete the job.
- john-tells-all 17 hours ago
  There's a big difference between a _puzzle_ and a _mystery_. In a puzzle, the goal state is known, and as more pieces - data - appears, the goal gets closer. You know how far you are from the goal.
  A mystery is worse. With each additional piece of data, the goal gets farther away. Everything is more and more confusing.
  (Popularized by Malcom Gladwell)
  [-]
  - mrlongroots 15 hours ago
    Maybe I am missing something but I just find this wrong.
    Everything is a puzzle: there is one "Truth" or one diagnosis. You (a smart human) should be able to converge on it by cross-examining your LLMs. By themselves, they have no interest in revealing this, no stakes, which makes them tools only useful at the hands of a capable investigator.
    [-]
    - Paracompact 14 hours ago
      > You (a smart human) should be able to converge on it by cross-examining your LLMs.
      What makes you think this is fundamentally different from cross-examining ELIZA? There is no guarantee that the LLM will help you converge on anything. Indeed actually calling out an LLM on BS tends to eventually produce an "I don't know and can't help you further" answer (as it should).
      [-]
      - mrlongroots 14 hours ago
        > There is no guarantee that the LLM will help you converge on anything.
        Absolutely. The guarantee does not come from the LLM. The LLM is a simply an improved version of Google Search.
        The guarantee can only come from a systemic application of epistemic discipline and reasoning, which is very much (smart) human territory.
        Put it another way, I could make good decisions with/without LLMs, with some uncertain diagnostics as input. I would have to trawl through 50 papers myself, and it is possible that my decision arrives 5 years too late as a result. LLMs enable trawling and do some of the legwork in connecting the dots, but are ultimately only as capable as the orchestrating human.
      - fc417fc802 13 hours ago
        The same goes for a human expert. There's no guarantee of convergence and you could eventually end up at "I don't know".
    - scheme271 10 hours ago
      The problem is that the diagnosis might not be known for a while. There's a few conditions and diseases that require an autopsy for a guaranteed diagnosis and therefore are diagnosis based on symptoms in clinical settings.
- 010101010101 17 hours ago
  > The solution to uncertain information isn't more information, which the AI can certainly provide, it's better information, and AI cannot currently provide that.
  I'd argue that AI _can_ currently provide that, but that it can't do it _reliably_, and that to non-experts it's impossible to differentiate, which makes it all the more dangerous.
  [-]
  - margorczynski 17 hours ago
    Isn't that the case with human "experts"? If you had encounters with doctors, mechanics, etc. you'll know you can get a completely different diagnosis for the same problem which obviously means (in most cases) that the person you thought an expert is wrong.
    What is needed are studies that will take a cold look at the actual results because AI seems to be required to be perfect or it is useless. It just needs to be as good as a human for most stuff, but in the long run it will be much better. At least that what extrapolating current reality shows us.
    [-]
    - wwweston 15 hours ago
      We have systems around humans that exist to manage expertise gaps, credibility signals, and accountability. This is part of what makes humans as good as they are, along with specialized training and some measure of meritocratic selection. We license and regulate and account and litigate to make a system that responds and improves.
      Some of this might be applicable to LLMs, but some isn’t and much of it would be resisted. This is one reason we’re not likely to get “as good as a human” because at some level we’re not optimizing for the outcomes; we’re optimizing for speed, convenience, some participant’s economics, and underlying beliefs.
      [-]
      - malfist 14 hours ago
        I've been going through PT for a hypermobility disorder related injury and I've use an AI to help me figure out "interview questions" to see if a PT knows anything about hypermobility or is willing to learn. I found it helpful to select a new PT after my first PT I trusted made things worse by prescribing stretches and no load progression from rest and recovery back to deadlifts
        [-]
        kerabatsos 12 hours ago
        People put a lot of faith in human “guardrails”, standards, etc. But the same argument could be made that trusting human experts without discernment is as dangerous as trusting AI or Google or whatever other non-human source. It’s always been the case.
- throwaway2037 57 minutes ago
  You nerd sniped me with the story about your used car. What happened in the end? I really want to know! There are some fun YouTube channels that basically do the same. Someone who is an expert auto mechanic takes a used car to various repair garages and asks them to recommend a course of action.
  [-]
  - namelessone 42 minutes ago
    Sounds like a fun watch! What is the name of the channel?
- dumb1224 3 hours ago
  I tried that AI diagnosis for my 15 old Ford C MAx too, however with a diagnostic problem the issue is unless you've got the ground truth, there's simply no way to verify any tool / human with a metric that you can compare and decide on future tasks.
  The AI might be very good at diagnosing all minor issues, but might not lead to a successful repair, whereas human mechanics are extremely good on 80% of major issues that's not the ground truth, but will lead to successful repairs (that might not address the root but simply patch it). So it comes down to manage expectation / outcomes.
- Bratmon 16 hours ago
  To provide a competing point of anecdata: A Gemini diagnosis saved me $3,000 in unnecessary repairs on my Civic.
  [-]
  - fluidcruft 14 hours ago
    YouTube has saved me at least that much in appliance repairs... and it doesn't even have an AI. It's amazing how valuable access to information can be.
  - ahepp 13 hours ago
    I would love to hear more about this
  - dyauspitr 15 hours ago
    Saved me $2000 on a koi pond pump and filtration system
- ed_elliott_asc 17 hours ago
  The soothing sound of ChatGPT telling us how right and clever we are…how could it possibly hallucinate, certainly not 5.5
  [-]
  - nonethewiser 11 hours ago
    You’ve really honed in on the key issue. This is exactly how keen hackers news commenters approach this.
- darkwater 2 hours ago
  > I got 3 completely unrelated recommendations, including one that I knew was invalid! I felt worse off than when I started!
  I would frame it differently: you now know which shops are not to be trusted. So, next time you need one, you will take a better decision.
  [-]
  - abirch 2 hours ago
    There are few things better in this world than having a car shop you can trust. I found one and pray that management doesn't change.
- serial_dev 14 hours ago
  These tools can’t reliably fix a 4px misalignment on my icon, better ask them about a medical report… but honestly, I would do the same.
  [-]
  - Gigachad 12 hours ago
    Tbh LLMs pulling data out of medical documents in it's training set and searchable online is likely a much easier task than fixing some weird CSS alignment issue.
- ryukoposting 6 hours ago
  > I got 3 completely unrelated recommendations, including one that I knew was invalid! I felt worse off than when I started!
  I almost had a very similar experience with my beater Lexus. It took 2 independent shops and 3 dealers to finally figure out what was causing the ABS to go off randomly at low speeds. Turns out there's some obscure Toyota-specific tool from the late '90s that picked up a proprietary diagnostic code, and the third dealer was the only one that still had that particular piece of equipment.
  ...and of course, the thing that's broken has been out of production for 20 years and remanufactured ones cost more than the car is worth. I ended up just unplugging the ABS control module.
  Point being: once I knew what was wrong, all the seemingly contradictory information from the other 4 shops suddenly fit together. It's just such a weird thing to go wrong that no reasonable tech would ever have considered it.
- weatherlite 4 hours ago
  > it's better information, and AI cannot currently provide that
  It sometimes can, if it straight out never can no one would use it. People use it , lots of them.
- UltraSane 12 hours ago
  > There's something incredibly peaceful about being in the hands of an expert you trust
  This is the primary business model of enterprise IT and is why companies pay so much for 4 hour disk replacement.
- nonethewiser 11 hours ago
  You only got 3 opinions on your car? Why not 50? You could have found a more useful signal by getting more information.
  I get it - getting an opinion from a mechanic is time consuming. Not true of AI though.
kgeist 15 hours ago
A few years ago (before the AI craze), I was misdiagnosed with tuberculosis. I had a chronic cough, and an outsourced radiologist at a clinic found signs of tuberculosis. The findings were sent to the city's tuberculosis hospital, as required by the country's law. The doctors there took the radiologist's conclusion at face value and required me to stay at their hospital for at least 8 months under a strict, prison-like regime. There was no option to say no, because I was considered some kind of biohazard, and by law I had to comply.
Before I was admitted, I quickly found another radiologist, who diagnosed pneumonia instead. I sent his report to the chief doctor at the tuberculosis hospital, and after some deliberation they concluded that the original reading was wrong. Turns out the doctors there can't read scans at all and just believe whatever a radiologist says...
The funny thing is, they had already officially put me on the tuberculosis register and didn't want to admit they had made a mistake. So instead, they simply gave me another paper saying that I had been cured of tuberculosis by them... in 7 days. I'm probably the only person in the country to defeat tuberculosis in a week :)
So if you don't trust the radiologist/doctor, maybe find another doctor if you can afford it? You can compare their conclusions and see if they match. Two unrelated doctors or radiologists saying the same thing is probably about as close to the truth as you're going to get. I'm not sure though whether I should trust AI or humans more. AI can hallucinate, but I've been misdiagnosed by humans so many times too...
[-]
- engeljohnb 46 minutes ago
  A second opinion is a smart move if one has doubts about their diagnosis. Doctors make mistakes, and even though I've worked with countless great doctors, I've never worked a job where there wasn't at least one who was undiscerning, or downright lazy and negligent. It's hard to tell people to trust their doctor when I know there are plenty of doctors out there like this.
  But AI as of right now is worse than any bad doctor I've ever worked with.
- azan_ 15 hours ago
  How is it possible? You can't diagnose tuberculosis just based on imaging and tuberculosis hospital has to know that!
  [-]
  - kgeist 14 hours ago
    Yeah, I know! It was strange. They gave me a test, and it came back negative, but they insisted it was negative because I had "latent tuberculosis," which supposedly wasn't detectable by the test yet but was about to become active.
    I forgot to mention that, besides getting a second opinion from another radiologist, I also took a more modern test at another private clinic. That test has better detection rates than the one the state clinic used, and it came back negative too.
    I have suspicions they had some kind of government quota to keep the hospital staffed with patients in order to receive funding. Or they were just completely incompetent. I pushed back by bringing them another radiologist's report and the results of a better test that I paid for myself, so I guess they decided to back down.
    [-]
    - spwa4 1 hour ago
      You'll find doctors always believe and treat the worst diagnosis any professional has put on a case. That's a legal thing, not a skill issue.
      Think about the consequences of mistakes in both directions ...
  - shiandow 5 hours ago
    Not only that, what is the point confining someone to prevent the spread of a disease about a quarter of the world is already infected with?
    I suppose there could be reasons, but I don't know them.
  - comboy 3 hours ago
    Incentives.
- igortg 14 hours ago
  I had a similar experience. My son had pneumonia and was still filling pain after 10 days of antibiotics. Took an X-Ray to three different doctors, and only one got the right diagnosis (pleural effusion). It's really something we should have a central place with top notch professionals looking at it, instead having each doctor to find by themselves.
  [-]
  - mncharity 6 hours ago
    I once worked on a medical hackathon concept for computer-assisted population screening for cervical cancer in a developing nation. Community health workers take photos. The AI would look at the images, and make a call of "clearly negative" vs "clearly positive" vs "needs (scarce) expert review". But taking good photos is hard, so it's also "photos insufficient" and "worker needs additional mentorship on taking photos". Only by computes reducing all three costs - expert workload, exam success, and quality-control/training - might successful deployment be financially and logistically plausible for that nation.
- beacon294 10 hours ago
  What country / municipality are you in? This is not my understanding of Tuberculosis...
- rpastuszak 3 hours ago
  Asking for a friend, who is in a somewhat similar predicament — it wasn’t Portugal, was it?
themantalope 9 hours ago
Radiologist. I don’t read MR shoulder exams in my day to day practice, but from the few pictures shown , I can’t conclusively disagree with the original report.
These models are generally terrible at reading medical images. The amount of public training data on the internet compared to the number of scans a radiologist reads in training is minuscule. There’s obviously a ton of medical images in general but very few, and even fewer along with a report are available on the internet publicly for download.
There are vision language models coming out of research labs that are excellent in describing and localizing findings. Still at the level of a 1st or 2nd year radiology resident, but as we all say - this is the worst the models will ever be.
[-]
- deaux 4 hours ago
  Absolutely. It's very unfortunate that this post used the worst example possible of using LLMs for medical purposes.
  General-purpose LLMs are _fantastic_ at medical diagnosis that do not involve imaging. I am completely convinced that given enough information and time, frontier models already outperform >90% of doctors on initial diagnosis of internal issues and suggesting medical tests to further reject or confirm the most likely theories. To the point where I'm eagerly waiting for the first hospital in the world that's willing to be open and honest about using them for that first step, and then proceeding from there. I'll be on a flight there as soon as one arrives.
  At the same time, they're worse than useless at anything involving medical imaging. Asking them to interpret them is worse than trying to interpret them yourself as a layman. And you surely wouldn't interpret them yourself.
  [-]
  - throwaway2037 23 minutes ago
```
    > General-purpose LLMs are _fantastic_ at medical diagnosis that do not involve imaging.
```
    Can you share the reasons that you believe this?
```
    > At the same time, they're worse than useless at anything involving medical imaging.
```
    What is special about medical imaging that makes AI/LLMs specifically bad?
- throwaway2037 27 minutes ago
  No trolling here: Do you feel threatened by the advance of AI/LLMs with respect to your field? I would. I am a computer programmer, and it absolutely feels threatening.
- yfontana 2 hours ago
  Yeah, medical computer vision is a (fascinating) field with a lot of ongoing research. SOTA models are highly specialized, and are only getting good enough to be used by actual doctors and patients. Using a general purpose LLM to do this is similar to giving a credit card to Openclaw and telling it to make you rich through the stock market & cryptos.
- odiroot 2 hours ago
  I can see how your thesis is valid.
  Like OP, I also had a shoulder MRI, and asked two AIs for opinion (awaiting a follow up appointment to discuss the results).
  They both insinuated much more serious problem than it was (as judged by an orthopaedic doctor).
- billynomates 4 hours ago
  Anecdotally, I've had Claude (Sonnet and Opus latest) consistently misread numbers from screenshots of my macro tracking app. Makes me skeptical of claims about its usefulness for anything requiring accurate image interpretation, let alone MRI analysis.
- make3 6 hours ago
  [flagged]
voidUpdate 47 minutes ago
How do LLMs get information from images? Do they have to run essentially the opposite of an image generation model, taking an image and converting it into a description? I'm just concerned that the description wouldn't be able to encapsulate the information needed to differentiate exactly what is wrong with a shoulder. The image -> text model would need to know what it should actually report back to the LLM about the image, so that it doesn't just say "this is an MRI of a shoulder" or similar. It would be like a layperson describing a bridge, and asking an engineer if the bridge is safe based on that description
[-]
- weird-eye-issue 38 minutes ago
  No, it does not work like that, it actually can process the image itself there is not an intermediate image to text step
  [-]
  - voidUpdate 35 minutes ago
    How does a Large Language Model process images then?
    [-]
    - jappgar 14 minutes ago
      It can only deal in tokens, so you're essentially right that it creates a textual description before describing it back to you. This process is obviously incredibly lossy and details are easily missed
manicennui 7 minutes ago
If you see an orthopedic surgeon (the much more common name for this type of doctor in the US) about almost anything you will almost certainly end up with an MRI (good chance your insurance won't cover this fully) and some kind of recommended intervention.
sxg 19 hours ago
I'm a radiologist but can't really weigh in without seeing the full 3D MRI dataset. Regarding this point:
> They performed shockwave therapy on my shoulder even though a recent clinical practice guideline says clinicians should not use or recommend shockwave therapy for rotator-cuff tendinopathy without calcification; I was told during ultrasound that there was no calcification.
Ultrasound isn't a great way to assess for calcification. It'll find large calcification but easily miss small ones. Plain radiograph would be more helpful, but the MRI may have revealed it as well. Either way, shockwave therapy isn't harmful in the absence of calcification--it's just not helpful.
Edit: when a radiology report says something isn't present, there's always an implicit caveat that the finding isn't present within the context of the modality and images obtained. So an ultrasound report can state there are no calcifications while a plain radiograph can report the presence of calcifications without being inconsistent. Obviously very confusing to patients and people unfamiliar with medical jargon, but clarifying this in reports would make them sound even more qualified, "hedgey", and annoying to read than they already are.
[-]
- ambicapter 17 hours ago
  > So an ultrasound report can state there are no calcifications while a plain radiograph can report the presence of calcifications without being inconsistent. Obviously very confusing to patients and people unfamiliar with medical jargon
  This is being overly nice, I think. Anyone who doesn't understand this is an idiot imo. You would have to assume that every type of diagnosis instrument has infinite clarity and is always correct to be confused in this case.
  Reminds me of the Babbage quote where somebody asked him, if I put the wrong question into this computing device, will it still give me the right answer? His response, paraphrased "I can not fathom the logic of the minds which would come up with such a question".
  [-]
  - MattyMc 12 hours ago
    > Anyone who doesn't understand this is an idiot imo
    I don’t think that’s true. Avoiding this mistake requires knowing that an ultrasound may not detect calcification. For a patient reading their own report, I don’t think that’s intuitive. I would expect most people to read “no calcifications” and assume that their joint has no calcifications.
    [-]
    - Fr0styMatt88 8 hours ago
      Exactly. I was about to reply to the comment with “perfect example of not knowing what you don’t know” in terms of self-diagnosis.
      My internal model is/was “if the scan wasn’t set up / can’t detect the thing, why would the statement be present at all?”.
      That implicit assumption is really subtle.
    - nkrisc 12 hours ago
      Most people should have learned at a young age that absence of evidence is not evidence of absence. My 8 year old understands this. After all, you can rarely ever prove something does not exist, only that it is unlikely to exist.
      If a report states that X was not found, it does not mean X did not exist, it means it was not found.
      What may be lost on the layperson is the nuance and understanding of how thorough or not a particular scan is and how much weight to give the findings and thus the odds that the report is correct.
      [-]
      - sjducb 2 hours ago
        > Most people should have learned at a young age that absence of evidence is not evidence of absence.
        I’m fairly sure that there are no lions in my house. Lions are quite large and I’m capable of detecting lion sized objects with my eyes.
        To demonstrate that something is not present you first define the object, then come up with a test that will reliably detect the object. If the test comes back negative then the object is not there.
        In a strict philosophical sense I cannot prove that there are no lions in my house, the external world might not exist! A hypothesis that no one has thought of might be correct and that hypothesis could show that there are invisible lions in my house!
        However I intend to act with the certainty that there are no lions in my house. Because I have no evidence of lions in my house.
        Absence of evidence is evidence of absence.
      - aforwardslash 11 hours ago
        This is - by far - the most stupid stuff I've read on the internet the past few days. They didnt find cancer either (as well as a plethora of diseases that could be related to the symptoms), and afaik its not in the report.
        Yah you can argue that the tool is not ideal for that diagnostic, yadda yadda. I get it, and in the end I agree with the subtle difference you highlight, because it is something that makes sense to a certain kind of people. You know how many medics would read the report exactly like the author did? Too many.
        How do I know? Im not in a wheelchair after being constantly misdiagnosed by using the wrong imagiology technique by (mostly) chance, and a good help from friends, including a surgeon. This seems to be a case where AI would be a valuable doctor tool for differential diagnosis; instead we have know-it-alls that can't bother to verify, and AI that often gets details wrong. That is the problem.
        [-]
        Fr0styMatt88 8 hours ago
        I think it’s the combined depth AND breadth of knowledge that can be captured by AI models that is going to make them way better than most humans at this kind of stuff.
      - Sabinus 11 hours ago
        It's like when finding out about the sex of your baby via ultrasound before they're born. If you're told it's a boy, you can be pretty certain you're getting a boy. If you're told it's a girl, you shouldn't get too attached to the idea. The ultrasound tech might just have missed the evidence your baby was a boy.
        [-]
      - ytoawwhra92 6 hours ago
        "Calcifications not found" is a different statement from "no calcifications".
        Even then, the context that "ultrasound isn't a great way to assess for calcification" is important when reading either statement. Laypeople don't necessarily have that context.
      - mewpmewp2 8 hours ago
        But the problem was that the report is not saying "not found", it is saying "is not present" or "there is no X".
        And I think we can easily have examples where we can reasonably trust this, and a spectrum of such.
        E.g. there is a math solution and the report says "there is no errors in this solution", you would imagine that to be quite reliable, no?
      - O_H_E 10 hours ago
        > Most people should have learned at a young age that absence of evidence is not evidence of absence.
        That might be true, but it is definitely not the world we live in.
    - eqmvii 10 hours ago
      It’s 2026 and my computer will happily give me the right answer even when i make typos. I love it.
    - tomlockwood 12 hours ago
      It's a fatal flaw to think counter-intuitive == wrong.
  - Georgelemental 12 hours ago
    > You would have to assume that every type of diagnosis instrument has infinite clarity and is always correct to be confused in this case.
    There's a difference between 99.9% clarity and 50% clarity. Even if neither exactly equals 100%, it's understandable that a layperson would expect different language between them
  - BrokenCogs 9 hours ago
    This comment sounds like it's written by someone who doesn't interact with real people very often
    [-]
    - DrewADesign 9 hours ago
      I’ll bet they’ve got a debilitating case of engineer’s disease, too.
  - Paracompact 14 hours ago
    "On two occasions I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."
    [-]
    - IanCal 14 hours ago
      Off topic but I have always felt this seemed like his misunderstanding rather than theirs. It’s an odd question, but it’s a very sensible point to make if Babbage has just told you this will solve the problem of mistakes in calculations - humans being involved at the start means human error still plagues the output.
      [-]
      - jrumbut 8 hours ago
        > I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
        Well, he did diagnose the situation correctly. He couldn't comprehend the confusion of ideas that provoked the question.
        I'm also not entirely sure it's an odd question to ask. To this day, users are surprised when their software produces garbage output instead of failing. Perhaps the members of parliament were expecting some form of input validation or sanity checking out output.
      - Paracompact 13 hours ago
        Looking into his biography, it seems that he was indeed pitching the engine not as a means of efficiency, but as a means of avoiding mistakes in mathematical tables. It would have done Babbage well to insist he couldn't possibly solve all classes of mistakes, but would have solved a great many of them! "Why yes Senator, you are quite intelligent and handsome and make a fair point, allow me to give you the finer picture..."
        Would have also been a fair point if Babbage had channeled his inner techbro and insisted it would directly replace human calculators; simple machines like Babbage's will chug along blindly on obviously erroneous data, but humans for all their sloppiness can often backtrack on errors.
    - areoform 14 hours ago
      To quote the LLM-ism, they were making a sharp point. It doesn't matter how precise the calculations are if you're calculating the wrong thing.
      I suspect their sarcasm might have escaped Babbage who seems to have been on what we now call "the spectrum."
    - Fr0styMatt88 8 hours ago
      Actually, I would be really pleased if a member of Parliament asked that. That shows a level of deeper consideration.
      Isn’t there a saying about there being no stupid questions, only stupid answers or something?
  - DrewADesign 9 hours ago
    I don’t think people are idiots if they don’t understand how a normally intelligent person might not intuit that. I do think they have a seriously underdeveloped theory of mind.
  - akoboldfrying 12 hours ago
    > Anyone who doesn't understand this is an idiot imo
    I disagree. A priori it's not obvious to a layperson whether or not a statement that uses unconditional phrasing is intended to be authoritative or conditional on something unspecified, like the resolution of the measuring device. This goes for any sufficiently technical field.
    If you got the brakes checked on your car, and the mechanic did <something> and told you there are no issues with them, and you then took your car to a different mechanic who did <something else> and told you there is a problem, you would not be an idiot for thinking that these conclusions contradict one another.
  - BurningFrog 9 hours ago
    > Anyone who doesn't understand this is an idiot imo
    Even if this is true, so what?
    Idiots get sick at least as often as others, and the medical system needs to work as well as it can for that population too.
  - crypttales 14 hours ago
    [dead]
- rylando 13 hours ago
  As a rad tech, YOU TELL ‘EM DOC! I do like some uses of AI I’ve seen that help patients advocate for themselves or understand basic things like blood panel numbers, but it’s really bad at glazing people and leading them down medical rabbit holes kind of like the OP.
  You would think that the AI would point out that calcium is best demonstrated on Radiographs/CT imaging vs Ultrasound or something to that effect.
  [-]
  - garciasn 13 hours ago
    Semi-related: my father has complications from a motorcycle accident ~25y ago that crushed arteries in his leg coupled with diabetes (insulin / kept sugar at ~100 and his A1C was kept under 6.7 for ~15y). 6w ago had to have his toes removed due to dry gangrene; they eventually (2.5w ago) had to remove his leg below the knee because of the severe blood flow issues below the knee.
    Between the toes and the below the knee amputation, there were no less than 15 different doctors and PAs / related personnel who COULD NOT COME TO A CONSENSUS. They would just tell my mother and I (PoA) the details; they refused to come up with a singular plan of action moving forward, leaving it up to us to make 'an informed decision,' something that's IMPOSSIBLE when you have to take up to 15 different opinions into consideration.
    What exactly are we supposed to do as patients/family members when medical personnel cannot give reasonable paths forward and instead just throw a bunch of shit over the fence at you and tell you, "you decide what to do from here," regardless of how many VERY DIRECT conversations I had w/the 'care team' on doing better to provide a limited array of options and reasons/likelihood of 'positive outcomes'.
    I'm used to dealing with a wide variety of stakeholders/SMEs in decision-making; it's my job to apply my extensive industry experience to present our clients with their options, ranked and reasoned. Doctors, in my experience and most recently with my father, clearly do NOT do that (I assume due to liability; but, no real idea, honestly). So; when dealing with LIFE CHANGING circumstances, what are we supposed to do except rely on what might be able to offer more analysis and option narrowing w/AI?
    I certainly don't want to make the job of medical staff more difficult by putting out crazy theories I found on the interwebbernets through my own research, etc; but, when we're having to deal with uncertainty and insanity, what else can we do?
    [-]
    - resonious 12 hours ago
      This lines up with my experience with my mother, though it played out differently. In her case, she would switch doctors every ~5-10 years and each time they'd basically say everything the previous doctor said was wrong. First it was "you have Lupus", second it was "actually it's some other autoimmune disease", then it was "actually whatever you had has been in remission for some time now and you've been taking brain-numming medicine for no reason." Then it was "you have cancer", "it's a rare one", and "oh turns out the brain-numming meds have a correlation with rare cancers". The cancer part was handled well (albeit unsuccessful) though. After such a bad time with rheumatologists, I was shocked by how competent people were when it came to cancer.
      All of the above was intertwined with brief stints with doctors that would just berate her for being a painkiller junkie, even though she hated the stuff and just wanted to find/fix the problem.
      Kind of a rant, really. I'm not sure how to tie it back into AI. I do wish we had AI at the time so that we could at least cross-check, but I also understand that doctors are already sick of patients self-diagnosing on the web and that AI probably just makes that worse. At the same time, if our medical system could catch up a bit (more doctors? less corruption/paperwork? not sure what it needs) then maybe people would be less inclined to take matters into their own hands.
      [-]
      - anon84873628 8 hours ago
        I'm sorry to hear that. The accusations of drug seeking are particularly galling.
        AI is absolutely a god send for patients navigating the medical system.
        I know the US system is horrible and I sympathize with doctors doing their best within it. But we must admit, they are also responsible for the countless stories just like yours, and have contributed to the public's deteriorating trust of medical institutions. It's not just the insurance companies and conglomerate CEOs.
    - osmano807 11 hours ago
      Probably liability... on the amputations I indicated and contraindicated, it's increasingly difficult to navigate trough patient perceptions while not disclosing so much as to give them rope to hang us. Some decisions are a game of probability that often we don't have clear numbers. In trauma, I have both cases where I recommended an amputation and at last minute decided to see that happened and the patient is walking with their leg today; and cases where I didn't recommend and later had to amputate as the lesion evolved. With cancer it's more straightforward, the cancer is what dictates the surgery... some cancers have poor response to other treatments, so we amputate. Some cancers had invaded the neurovascular bundle, so curative options involve necessarily amputation to get good margins. In cancer there's less doubt in the prognosis, so less chance of legal ramifications.
  - Fr0styMatt88 8 hours ago
    Your see this in coding agents too. The only times so far I’ve really seen Opus tie itself into a knot is where I’ve asked it to fix something that I thought was broken but actually wasn’t in the way I had described. It will bias towards your description (I’m guessing because that’s the most recent context it has?).
  - mring33621 10 hours ago
    i'm sorry, but AIs only "know" about stuff that they have been trained on.
    If we would allow AIs to be trained on the petabytes of medical data hidden in hospital systems, they would most likely be much better at diagnosing illnesses and conditions than the average doctor.
    (Justifiable) Privacy around medical records so far prevents this.
    You think you're cheering for humans, but in fact you are gatekeeping healthcare.
    [-]
    - prirun 9 hours ago
      I dunno... if we gave an AI all of these medical records as training data, wouldn't it be trained to give the same answers as the doctors already gave, without knowing whether those diagnoses were correct or not?
      [-]
      - anon84873628 8 hours ago
        Except it would see all the times similar starting conditions led to different diagnosis and recognize those contradictions. Or all the different treatments and their outcomes. And it would never forget or have bias.
        It would be like the sum of all medical professors in existence.
  - Eufrat 12 hours ago
    I feel like the promise of these models is to help people make more informed decisions. Improving the knowledge economy and general understanding.
    The problem is these are just statistical models at the end of the day, so you need to know something to be able to identify the errors. You can’t let them really be autonomous and you also can’t really have people turn into glorified approvers. If the machine is correct 89% of the time, you cannot make people responsible for that 11%. It’ll just cause automation fatigue.
    tl;dr: the actual use cases of these LLM (or generative AI in general) is rather limited, so it is offensive how much hay has been given to them eating the entire capitalist system. They are not fit for purpose.
    [-]
    - anon84873628 8 hours ago
      Why should we not expect a computer vision model to outperform humans on reading medical images?
      The human experts are literally just a trained biological neural network. In this domain they are not capable of anything a computer can't already do.
      [-]
      - Eufrat 7 hours ago
        > Why should we not expect a computer vision model to outperform humans on reading medical images?
        Humans can identify. A computer vision model can return a statistical value. Both can make errors, but these errors are orthogonal to how we work and what is being asked of them. I think a CV model can absolutely provide value as augmentation. Identifying possible misses or a different diagnosis worth considering, but that is not what is being asked of them here. The pitch by Altman and Amodei is not to say, “This tool that might cost $1,000/month can help increase the accuracy of your diagnoses by 10%,” instead it’s, “This tool can allow you to keep 10% of your workers to monitor it and you can fire the rest. Also, the workers carry all the liability.”
        > The human experts are literally just a trained biological neural network. In this domain they are not capable of anything a computer can't already do.
        People need to stop making this baseless claim. Human beings are not stochastic computing devices, we are not neural networks. We don’t fully understand human cognition and intelligence. I have the highest confidence we will figure it out one day, though.
        Yes, neural networks were based on a superficial view of the human brain, that’s it. For instance, it is biological impossible for the human brain to do backpropagation, which is kind of important for a modern neural net.
        This really rubs me the wrong way because it's objectively false, but people keep bring it up because I think people want it to be true rather than accepting generative AI for what it is: a tool with a bunch of caveats.
- 2ap 16 hours ago
  Agreed. Not a radiologist, but I do a fair bit of MRI research. Experts vs lay people probably have different success with getting the right diangosis out of a frontier model. Subtle changes in prompts can cause different diagnosis[1]
  [1] https://www.nature.com/articles/s41591-026-04501-8
- haldujai 8 hours ago
  Radiologist who does read shoulder MRI would like to add that over half the annotations are wrong, glaring mistakes in anatomy and cardinal direction which begs the question of how is it making these findings without knowing what it’s looking at (here’s a hint, it’s hallucinated based on reports it sees).
  [-]
  - red75prime 5 hours ago
    What is "it"? Claude Opus 4.x? ChatGPT-5.x? GLM? DeepSeek? RadFM? Med-PaLM?
- odiroot 2 hours ago
  Can vouch for it. Ultrasound hasn't found calcification in my shoulder but MRI did. Exactly as you said, because it was very small.
- foobarian 18 hours ago
  Huh, I'm reading and looking up these words you guys are saying and it is starting to look exactly like the symptoms I have been having with my own right shoulder! I feel like a giant gaping rabbit hole just opened up next to my desk.
  [-]
  - sxg 18 hours ago
    We're discussing calcific tendinitis (https://radiopaedia.org/articles/calcific-tendinitis?lang=us). If you think you have it, you can see a doctor and consider shoulder radiographs to start.
    [-]
    - YeGoblynQueenne 14 hours ago
      If you think you have it, then you don't. If you have it, you won't think, you'll know.
      Spoiler: because it hurts like hell.
- tiahura 19 hours ago
  Why isn’t diagnostic ultrasound used in orthopedics? They inspect fetus hearts and other organs everyday, why not shoulders? Seems much cheaper and faster.
  [-]
  - sxg 19 hours ago
    They do. Ultrasound in orthopedics is a relatively newer field, and there aren't quite as many sonography techs and radiologists experienced in reading these studies, which is likely why you don't see it offered more widely.
    Edit: I should mention that ultrasound is basically unusable for evaluating bones. Sound waves can't penetrate bone, and so you end up just seeing a huge black void. That's a huge orthopedics use case that ultrasound just can't benefit. However, ultrasound is fantastic for evaluating muscles, ligaments, tendons, and other superficial soft tissues.
    [-]
    - VoidWhisperer 9 hours ago
      Serious question: If the bones specifically show up as black on ultrasound but the surrounding (muscle, etc) don't, wouldn't that be an option that could be used to try to determine a broken/fractured bone without the radiation from an xray? Or are the gaps in those cases usually too small to pick up?
  - scrollop 18 hours ago
    We order ultrasounds all the time for shoulders (for like soft tissue issues; for trauma, you'd start with an xray). For other joints, such as the knee, MRIs are a better choice (unless htere has been substantial trauma, in which case xray initially or further), though more expensive, unless you're excluding a Baker's cyst, in which case an ultrasound is fine.
    Since MRIs are more expensive, private doctor's might order them instead of an ultrasounds.
    (I'm a doctor)
    [-]
    - tiahura 14 hours ago
      Where are you? Pi and work comp attorney in medium US midwest metro. I've never seen one in 20y. Not from HCA ERs, medicaid er visits to univ affiliated er, nor prestige practices.
  - trentor 12 hours ago
    Ultrasound was overlooked by US medicine as a first line imaging tool for a long time because it takes real skill and experience to do it right. But it's making a comeback. We've had Chinese, Indian, Australian, and American doctors visit us for one to two month stints to build up their skills.
    Given the skill involved, it's probably a liability concern they don't want the exposure over there.
  - prdonahue 14 hours ago
    They're used quite a bit for nerve entrapment—both in diagnosing and treating.
  - bflesch 18 hours ago
    It's a manual, non-standardized process without a standardized output. Image quality depends both on user skills (how deeply they press the sensor on the skin) and the machine they have. Unlike CT/MRI the examination results cannot be easily shared and compared between patients for studies.
- engeljohnb 18 hours ago
  > I'm a radiologist
  Any comment that doesn't start with this or similar qulaification should be taken with a grain of salt (yes, including this one).
  Medical imaging is one of those things everyone thinks is simple because they don't know what they don't know. I'm a cardiac sonographer, and I have to assume radiologists hear at least as many eye-rolling takes on AI coming for their job as I do.
  [-]
  - lostlogin 18 hours ago
    Ahh, AI is coming for your job.
    Full sarcasm, is there one that’s that’s more immune?
    [-]
    - engeljohnb 17 hours ago
      I don't completely understand what you mean, but I can tell you for my job, having AI tell you how to get the images is (without exaggeration) like putting someone who's never played an instrument on stage and saying "don't worry, the AI will show you how to do it."
      [-]
      - lostlogin 14 hours ago
        I did a lot of cardiac MR and often GA cases. Sometimes after the scan an echo would be done.
        I know my anatomy and etc and have done a short stint in ultrasound. I have no idea what you are doing or looking at and can identify pretty much nothing.
        Echo techs are going to be around a lot longer than MR techs.
    - LearnYouALisp 17 hours ago
      cough Immunology
      [-]
      - 2ap 17 hours ago
        I mean, probably not. No expert, but everytime I go to an immunology meeting (I'm a paediatrician) they've got a whole stack of new diseases. The field is moving fast, and there has to be a careful amount of shared decision making about when to test, what a positive test means and so on. I reckon they're as safe as any of us.
        [-]
        LearnYouALisp 14 hours ago
        yeah, you said "one that is more immune"
- RA_Fisher 16 hours ago
  So Opus might be correct?
- backtoyoujim 16 hours ago
  Does radiology really make +$700,000.00 a year ?
  Someone on reddit claiming to be a radiologist claimed that.
  I wonder where the savings will go when those jobs are gone.
  [-]
  - Eji1700 16 hours ago
    > Does radiology really make +$700,000.00 a year ?
    The radiologist I know does not, but they are paid very well (and these numbers are always dumb when you're not sure if they're living in Manhattan vs literally anywhere in Kentucky)
    Like most medicine, a large % of the job could be done by any decently talented person willing to follow instructions and shadow for a few months.
    Like most medicine, the remaining % is what you're paying for, because it is literally life and death and you can't do things like "pull the logs" or "lets turn it off and take it apart" or "huh i need to put this down and come back later". Even in radiology, because "well lets just do it again to be sure" is often not a viable option.
    While there is a problem in how we have inflated the cost of education for medical fields, the insane health insurance issues (US obviously, but it does have some effect globally when the expert radiologist you hire from the US to help with research costs that much), and probably some better ways to approach splitting the work for the entire field, like most professions dealing in life or death, medicine likely will always be paid well.
  - sarchertech 15 hours ago
    Physicians salaries account for about 8% of healthcare costs in the US.
  - recursive 14 hours ago
    The savings go straight into patients' worse outcomes.
  - blanched 16 hours ago
    You know the radiologist you're responding to is a real person? Your last line seems needlessly callous.
  - the_real_cher 16 hours ago
    To the consumer! Haha just kidding. We all know where they'll go.
piterrro 15 hours ago
It funny to see the community here expects the human body to be treated like a deterministic function: for input X expect output Y - and that transfers to diagnosis - people expect to receive the same diagnosis from different specialists for the same issue.
Given human body complexity, the diagnosis is a compound output of the experience, knowledge gained throughout the career and diagnosis methods/equipment, the title (like Dr) is a certification imposed by the state so its "safe" to let people practice since they passed "the bar" - but that doesn't imply everyone will be treating the same.
Some specialists update their knowledge monthly, some yearly and some don't do it at all, there are so many variables in play here (geo, politics, even weather haha).
Having said that, choosing the specialist is really important, getting opinions about their practice and their speciality, you can only maximize your chance of getting the right diagnosis, but don't expect to get it right just because somebody is called a Dr.
[-]
- charles_f 15 hours ago
  > It funny to see the community here expects the human body to be treated like a deterministic function
  In a community largely made of people whose job it is to produce such functions, I'd say it's to be expected
  [-]
  - KingMob 6 hours ago
    It's funny (and a little depressing), because HN routinely assumes that their world view, and thus, their domain expertise, transfers.
    There's no shortage of tech people convinced they deeply understand law, medicine, philosophy, etc. despite never having read much on the topics.
    [-]
    - bpicolo 19 minutes ago
      The internet at large is full of armchair experts, it's not just a tech thing.
- b800h 15 hours ago
  I'm not sure what your point is. Are you saying that medicine is inherently fallible and therefore AI is more likely to make a good diagnosis - particularly a cluster of specialist AIs?
  [-]
  - mrlongroots 15 hours ago
    Yeah I think the OP is muddling the point by conflating "physician's version of the diagnosis" with "The Diagnosis".
    There is absolutely one "The Diagnosis". Human body is a machine, albeit a very complex one, and all measurement sources have noise. But they are all measuring one reality, and if there is a problem, there should be one explanation that all measurements align with. They can be noisy but can never be conflicting (instrument error notwithstanding).
    Physicians' ability to arrive at "The Diagnosis" would vary, but it does not mean one does not exist. I am not sure if characterizing human body as derministic or not is relevant here.
    [-]
    - piterrro 14 hours ago
      I think „the diagnosis” is over simplification and lots of professionals would disagree that there’s always a single one. As a patient your goal is to eliminate the symptoms of whatever is going on in your system. Often times there could be many reasons for it and only curing one can help you already. The diagnosis is a help tool to choose the roght curation method.
      Thus, chasing the „right” diagnosis (whatever that is?) is pointless, as it only the outcome (reducing symptoms, stopping the damage) can tell you if the diagnosis was right, but not the only one right.
      [-]
      - mrlongroots 14 hours ago
        > I think „the diagnosis” is over simplification and lots of professionals would disagree that there’s always a single one.
        "The Diagnosis" does not mean "one root cause".
        Situation: my car has some unexplained vibrations. 1. Mechanic A says that it is the engine mounts 2. Mechanic B says that it is some weirdness in how the exhaust assembly is hanging to the underbody 3. Mechanic C says that it is just my wife farting
        I replace engine mounts and 40% of the problem is reduced. I then drive without my wife and the remaining 60% is solved.
        "The Diagnosis" was: 40% mounts, 60% wife, 0% exhaust.
        There is always one "The Diagnosis".
        [-]
        exmadscientist 13 hours ago
        > There is always one "The Diagnosis".
        No, that is not true at all.
        This is a kind of thinking a lot of programmers fall prey to. The real world, outside of code, is a very fuzzy and inherently analog place. There is very rarely one in any complex system having a complex problem needing a complex solution. At some point even the definition of diagnosis gets fuzzy.
        The best demonstration of this in medicine is probably the DSM-5. What, really, is the difference between Narcissistic Personality Disorder and Borderline Personality Disorder and Generalized Anxiety Disorder? Can they overlap? (Yes.) How do you treat them? (It's not easy.) What about depression: how do you tell if someone has Major Depressive Disorder or Bipolar Depression? (Again: not easy.) In some circumstances the only way to tell the difference between the two is what drugs work: if antidepressants help, it's Major Depression; if mood stabilizers help, it's Bipolar Depression. It's kind of odd to define a One True Diagnosis by "well we fixed it this way, so it must have been that", with no other way to do it, isn't it? (What if both work? What if one works for a while, then the other works? What if treatment with antidepressants induces bipolar (hypo)mania? All of those happen!)
        And that's just a few examples.
        [-]
        mrlongroots 11 hours ago
        Pyschiatry gets complicated because the failures are not mechanical. Even if you could image every single neuron in a person's head we do not have a very good way to define an algorithm for these issues. I do not have a good answer for psychiatry.
        > This is a kind of thinking a lot of programmers fall prey to. The real world, outside of code, is a very fuzzy and inherently analog place.
        Having said that, I would vehemently reject and push back against this, and without doubting your sincerity, characterize it as an ad hominem.
        The vast majority of issues with the human body are mechanical in nature. Restricted blood flow, unwanted tissue, a broken bone, a bad valve etc. These are causal descriptions of "disease". Where causal descriptions exist, the "One True Diagnosis" principle holds. Psychiatry just happens to be unique in that it is a fuzzy science where we rely on checklists and ultimately all diagnosis is probabilistic.
        EDIT:
        > This is a kind of thinking a lot of programmers fall prey to. The real world, outside of code, is a very fuzzy and inherently analog place. There is very rarely one in any complex system having a complex problem needing a complex solution. At some point even the definition of diagnosis gets fuzzy.
        I would also push back against this mindset in general. This is not a falsifiable claim, it is incoherence as an argument, and I do not need to be a programmer to hold this position.
        That the real world is analog is irrelevant to its amenability to causal explanations. Or "fuzzy": "fuzzy" in this context just does not mean anything.
        I am not trying to sound exasperated or win internet points, just impress this point on you and anyone reading this. We can write math to predict weather, make it tractable to solve using approximations, tolerate IEEE 754 weirdness, and finally tell what the clouds will do a week from now. This is nature telling us that there is a pattern to how it behaves, and it is the only weapon we have as scientists.
        To say that nature is not amenable to explanations is a very defeatist thing to say: neither Newton nor Einstein nor any of the million-odd people that have built modern society would exist if nature did not have causal explanations. I urge you to reject this defeatist thinking.
        [-]
        movpasd 2 hours ago
        Not GP, but I'd argue that over-rationalism and underestimating both the complexity of the real world and the theory-ladenness of one's perspective is just as dangerous. The point is not to be paralysed by complexity, but to acknowledge it and acknowledge the reality of unknowable unknowns in our decision-making. I don't consider that defeatist in the least. Epistemic humility is the rational response to a complex world; courage is to act anyway.
        scheme271 10 hours ago
        There's quite a few diseases and conditions that don't have definitive tests. For example, alzheimer's and parkinsons are diagnosed based on medical history and symptoms. With alzheimer's an autopsy can tell for sure but that's not much help for a patient. I'm sure there's other things out there with similar situations. Hard to come up with "the one true" diagnosis with an definitive way to determine it.
        [-]
        mrlongroots 10 hours ago
        > With alzheimer's an autopsy can tell for sure but that's not much help for a patient.
        Ok let us unpack this statement.
        For your point to hold, I would have to be saying "all kinds of practical diagnostics are invented now. No progress can be made in better diagnostics".
        If Alzheimer's can be validated by slicing open a dead patient, there is a causal mechanical explanation for the disease. If we can not confirm that defect without slicing open the patient, that is a limitation of 2026 tools. The "One True Diagnosis" is an Oracle explanation that all real diagnostic techniques try to approach in the asymptotic sense, and it is helpful exactly because it clarifies in discussions like this.
        There are going to be diseases where we do not yet have causal explanations. Or where we treat them without establishing them. Hypertension is one example: while technically it can be caused by vascular stiffness, some weirdness with the RAAS system, some hyperadrenergic weirdness, practically you get a lot of mileage out of just prescribing people telmisartan if they're old.
        That does not mean the frontier of hypertension is settled, or the 10% who do not have a vascular stiffness problem would not benefit from better causal models of hypertension. Science is us continuously pushing back against the fog: of the tools we have in 2026, some are great, some are imperfect, some are promising etc.
        [-]
        scheme271 9 hours ago
        There might be "one true diagnosis" but there's no reason to believe that we'll have practical diagnostic tools to get it. If we need to sample the brain chemistry to diagnose a neurochemical disorder, it's probably not too useful in a clinical setting. The world makes no guarantees that we will be able to differentiate between certain situations with tools that we can realistically access and build.
        [-]
        mrlongroots 8 hours ago
        Today's limits are known and undisputable. Tomorrow's limits are a promise: some promises over-deliver, others under-deliver. :)
        Regardless, to bring the discussion back to the claim at hand: at all points in future, we will need the ability to reason under partial information. "Absolutely flawlessly complete diagnostics" is an asymptotic goal we get closer to but never reach. This is both very doable for a disciplined human, and very hard to outsource completely to an LLM. Treated as tools operatored by competent users, they are magical. But they can not outperform their user.
        KronisLV 4 hours ago
        > We can write math to predict weather, make it tractable to solve using approximations, tolerate IEEE 754 weirdness, and finally tell what the clouds will do a week from now.
        Even so, we’re operating on approximate datasets and sometimes our predictions are wrong. I think a lot of the medical field is like that - people are doing the best they can with what they have.
        It’s entirely possible that DSM-5 will be viewed as flawed and inaccurate in a century, but it’s better than nothing.
        Similarly, for every possible medical affliction there could be “The Diagnosis” that would describe how to treat it, we’re just unable to be that accurate and thorough. The fuzziness just means that you’d need 10’000 data points about the state of the body instead of 10-100 and also be able to reason about them.
        Paracompact 4 hours ago
        Most disorders in the DSM-5 are defined by polythetic criteria, i.e. meeting X out of Y symptoms from a list for a given duration of time, or by conjunction of polythetic criteria. These definitions are socially constructed and statistically validated for pragmatic use, but very rarely have definite underlying biological markers. Especially as concerns personality disorders, these disorders can also simply be an inheritance of cultural or political baggage and prior psychoanalytic theory.
        > In some circumstances the only way to tell the difference between the two is what drugs work: if antidepressants help, it's Major Depression; if mood stabilizers help, it's Bipolar Depression.
        This is ridiculous. There is zero mention in the DSM-5 or ICD-11 of "if these drugs work, it's this, otherwise it's this." I would question a psychiatrist dispositively making a diagnosis on such grounds.
throwforfeds 18 hours ago
I've seen a lot of friends and family members almost immediately get offered surgery for shoulder pain. It's just often the default for people that do surgeries for a living.
I also had a pretty painful shoulder issue at one point, where the pain just wasn't subsiding for months. I tried massages and acupuncture as I didn't want to do surgery, but it wasn't helping at all. The thing that fixed it for me was just really focusing on doing pull-ups. I couldn't do them at all when I started, so I began with dead hangs and scapular pull-ups, eventually progressing to regular pull-ups, and then training with a "grease-the-groove" method once I could get a few per set. I stopped the training schedule once I was getting in around 17 pull-ups per set, and now just do 6 sets of about 7-8 pullups 3x per week spaced throughout the day. I'll also do some shoulder mobility drills [1].
Whenever I get lazy about keeping up with them inevitably discomfort will start arising again, but it goes away once I get back to strengthening.
[1] https://www.youtube.com/watch?v=vP8YmmRMz6I
[-]
- dguest 2 hours ago
  Personally I've always appreciated talking to nurses I know.
  The respectable ones know they aren't doctors, but they've seen a lot more recoveries and cases where minimal intervention was required. As some people have said some surgeons like to cut people up.
- ktosobcy 16 hours ago
  I had issues with my shoulder for years. Tried PT as well as pull/push-ups but doing that made the pain worse (if I wasn't doing any exercises involving the shoulder it was "fine")…
  [-]
  - dripdry45 14 hours ago
    same here. I started doing yoga and rock climbing, and it stretched everything out, and strengthened all the muscles around it. I rarely have an issue now.
- alistairSH 18 hours ago
  On the flip side, when I had rotator cuff issues, the surgeon recommended months of physiotherapy before resorting to the knife. And it worked. And by weight training regularly with a focus on correct shoulder movement, the pain stays away.
  It really seems like if you, as a patient, go looking for a quick fix, that’s what you’ll be offered. And if you educate yourself a bit and then go t for the best fix for you, you usually get they.
  [-]
  - preg_match 17 hours ago
    Physical therapy is very often under recommended in the US under the belief that insurance won’t cover it. They might. And, for anyone reading, you don’t even need a referral for the first 30 days in some states. Physical therapy is for more than just hip replacements and car accident trauma. Like regular therapy, a lot of “normal” people can benefit from it. It’s also not just stretching.
    [-]
    - pseudoramble 12 hours ago
      As somebody in the US who had to do 2 months of PT before I could even get an MRI of an injury, this is both surprising, and yet also not, to hear.
      I broadly agree though; about a decade ago I had the standard office worker low back pain problems which cleared right up after doing squats multiple times a week. Of course a decade later I managed to blow out a disc at the gym, which I still work through as I write this today, but well worth the risk in the long run. Even with that long experience of strength training, the PT was worth it even if it didn’t fix my problem entirely. It added some variety and pointed out some details I had overlooked to improve my shoulder health.
    - alistairSH 12 hours ago
      Interesting. I’ve never not had some PT coverage. The copays kinda suck, but major surgery tends to add up as well, so…
  - blitzar 5 hours ago
    > the surgeon recommended months of physiotherapy before resorting to the knife
    In my limited experience, "If all you have is a hammer, everything looks like a nail", rings particularly true with medical professionals.
  - huhtenberg 17 hours ago
    What did you have exactly?
    With calcifications, physio without the shockwave component definitely doesn't allow going back to the normal gym routine. It's just not enough.
    [-]
    - alistairSH 12 hours ago
      Garden variety inflammation with some minor tearing, exacerbated by weakness/instability.
      Strengthening with PT kept the joint stable enough to stop rubbing and allow the inflammation to clear.
      And as long as I stick to a regular gym routine that includes rotator cuff work, it doesn’t recur (and did the few times I lapsed).
      But absolutely, PT doesn’t fix everything. Bit for a lot of things, it’s worth trying - but it might also means a lifetime of altered habits to keep whatever injury/problem from recurring.
deaux 4 hours ago
Frustrating post. This gives rightful ammunition to the calls of "LLMs need to be avoided for anything medical". Even though the issue is that they're asking it to interpret images. They need to be avoided for that, but that doesn't say much about their medical accuracy outside of image interpretation.
It would already be a huge benefit to 90% of people worldwide if the very first part of most hospital visits would be outsourced to frontier-level LLMs. Yet this kind of misuse just gives the medical industry a stick to beat that idea into the ground.
Oh well, I'm sure there will be at least a few countries that will indeed embrace frontier models for initial diagnostic medical purposes. Maybe medical tourism destinations. But it's unfortunate for those who can't afford the trip.
i4i 5 hours ago
"A 2026 Finnish study published in JAMA Internal Medicine that used magnetic resonance imaging (MRI) scans to look at patients’ shoulders found that 99% of Finnish adults over 40 have at least one rotator cuff abnormality." https://brainlenses.substack.com/p/abnormality
Incidental Rotator Cuff Abnormalities on Magnetic Resonance Imaging https://jamanetwork.com/journals/jamainternalmedicine/fullar...
linsomniac 18 hours ago
~2 years ago I used ChatGPT "deep research" to investigate a chronic sinus infection I'd been fighting for ~3 years. After seeing 3 GPs and 3 visits with an ENT, I fed all the observations I had into the AI. In particular, I couldn't get the ENT to explain why he visually saw, via a scope, evidence of allergic reaction in my sinuses, but then later concluded, after an allergy test, that it couldn't be treated via allergy medication. I asked this question a few times and he just never answered.
ChatGPT surfaced a NIH study that concluded that 20% of people have allergic reactions that are isolated to a body location, and that shoulder "skin prick" testing may not reveal. I asked him about that and he said "that's not how allergies work". Full stop. He was unwilling to even look at the study.
He prescribed a CPAP and regular nebulizer treatments. Side story: the CPAP place sent me a SMS message that I couldn't recognize was not a phishing attempt, and when I reached out to inquire who they were they never replied.
So I decided: Let me just try taking a second-gen allergy tablet every day and see what happens.
My sinus infections have gone away. Previously I was getting a major sinus infection at least quarterly. Maybe he's right that allergies don't work that way, but allergy tablets have absolutely solved my problem. Which I'm thankful for because I tried a CPAP for a solid month a few years ago and I just could not get used to it, and was sleeping like crap.
[-]
- nostrebored 18 hours ago
  Daily allergy tablets are associated with huge increases in early onset Alzheimer’s. Glad you found something that works, but might be good to get some of the allergen injections :)
  [-]
  - linsomniac 15 hours ago
    That seems to be only for first generation, drowsy-making, tablets. Second gen formulas don't cross over the blood/brain barrier.
    https://www.myalzteam.com/resources/zyrtec-and-alzheimers-me...
    There IS one year-old finding that suddenly stopping Zyrtec after daily 3-month use may lead to nasty itching, and if that happens you can re-start and then taper off. https://www.fda.gov/drugs/drug-safety-communications/fda-req...
    [-]
    - amluto 9 hours ago
      Zyrtec/cetirizine is a weird one. Everyone seems to agree that it is a second-generation antihistamine. The FDA seems to play along. But there is no particular shortage of anecdotal evidence of people who find it quite sedating (hi there!), there are some papers questioning its status [0], and the FAA puts it in a category with diphenhydramine, not with loratadine and fexofenadine [1].
      [0] https://pmc.ncbi.nlm.nih.gov/articles/PMC1118461/
      [1] https://www.faa.gov/ame_guide/media/AllergyAntihistamineImmu...
      [-]
      - mtlmtlmtlmtl 7 hours ago
        Ceterizine does cross the BBB to some extent, just less so than a lot of furst gen ones. So it can still have some hypnotic effects, sure.
        But that's not the important difference here. The important difference is that ceterizine has negligible antimuscarinic effects, unlike DPH, meclizine, cyclizine et al. Antimuscarinics are nasty drugs, and the antimuscarinic activity(and sometimes other non-histaminergic activity as well) is why a lot of first generation antihistamines are so bad for your brain.
  - cenamus 18 hours ago
    Where are getting that from?
    All I can find is about 1st gen antihistamines (i.e. Benadryl, which I doubt many people take daily, because of the drowsiness).
    Even for those, evidence seems to be mixed at best. "Huge increases" seems like hyperbole.
  - fuomag9 16 hours ago
    Only first gen, 2nd gen does not have this issue anymore or it’s greatly reduced
  - meindnoch 18 hours ago
    Misinformation.
    Only first-generation antihistamines with anticholinergic effects are associated with cognitive decline in elderly patients.
    [-]
    - ForceBru 16 hours ago
      LMAO at how the two of you sound authoritative and knowledgeable, but neither linked to ANY studies (or at least personal anecdotes) to support your claims.
      Yet here we are, warning each other about the dangers of LLM hallucinations. Humans "hallucinate" (provide random authoritative-looking information without anything to back it up) pretty often too.
  - tnchr 18 hours ago
    I believe it depends on which ones, the older gen or certain classes of antihistamines
  - darkwater 18 hours ago
    Wait, what?? Now I'm getting in panic mode because I do take regularly anti-hystaminic tablets/pills (the newer ones, based on ebastine because they don't make me feel sleepy)
    [-]
    - linsomniac 15 hours ago
      You don't seem to need to worry, second gen tablets seem to be fine: https://claude.ai/share/9c6eaaa5-e734-4267-b540-ddc0188daf6b
      [-]
      - yosame 12 hours ago
        Posting a claude chat is not actually helpful. The chat doesnt't even cite any sources.
        I think this post is a decent summary, the answer is a soft maybe: https://www.health.harvard.edu/mind-and-mood/should-i-worry-...
        Second-gen tablets might increase dementia risk by a small-to-medium amount (there's almost certainly still a small degree of CNS activity, and we don't know what causes dementia in the first place), but researching it will be difficult. Dementia is hard to research because of how long it takes to develop, and it's poor coding within health data, and antihistamines are hard to research because they're not often prescribed and aren't available in the health data.
        If it's a large effect, those factors wouldn't matter, but smaller risks are harder to detect and more sensitive to bias. If you want to minimise dementia risk, then reducing antihistamine use might be warranted, but you're probably better off addressing the risk factors we do know about: https://www.dementia.org.au/brain-health/risk-factors-develo...
        [-]
        linsomniac 9 hours ago
        >Posting a claude chat is not actually helpful. The chat doesnt't even cite any sources.
        You didn't read it very closely, I posted it specifically BECAUSE it cites sources (14 of them by my count).
        edit: I sit corrected, if I open it in Incognito mode the citations are removed. That's not very useful.
        edit 2: Here is a gist with the citations: https://gist.github.com/linsomniac/6d2bdeb0f63cf504354b067e2...
- braiamp 16 hours ago
  Ok, there's a lot to unpack here and you really had the deck stacked against you. First, lets go from the top, once a test says X, disproving that X is really hard. And that's not unique to the medical profession, it's inherent to all humans and we suck at revisiting or revising our decisions, much less at looking at the possibility to even reverse it.
  Which moves us to the next two issues: liability and time. Any moment that you ask someone to revise a decision and specially with the stakes that the medical profession has that nobody has the time nor the inclination to open themselves for a mess.
  Now, if you really want to be successful, you have to, before they even have a case with you, and specially before the diagnostic loop closes, to suggest the tests that the study has, since that has the biggest chances of looking at the right thing to look. Just be straight that you walked in with a theory. Doctors notice when they're being steered way faster than they notice when you're actually right. That's how you work with the systems that have a overworked mass trying their best.
  [-]
  - linsomniac 15 hours ago
    >before they even have a case with you
    My problem is that I needed information from 2 ENT visits to feed into ChatGPT to get that study. On the first visit he scoped my sinuses and immediately said "I can see evidence of allergic reaction, see those white bumps?". On the second visit I got an allergy stick test and it came out negative.
    Those helped lead to that NIH study. It would have been very hard to have walked in with that study in hand.
    [-]
    - braiamp 11 hours ago
      I mean, for the second (third?) option specifically. You will notice that they will be more open to go outside of the standard of care if you have a working theory.
rasmus1610 18 hours ago
As a radiologist I have found Claude and ChatGPT to be absolutely terrible at MRI and I would not trust it one bit. It has its merits if you need to research stuff that is more text based, but radiological images is just something that they cannot interpret good enough (yet)
[-]
- lostlogin 18 hours ago
  AI makes up for its poor reporting by enhancing the images.
  Current Siemens MR software ‘Deep Resolve’ makes up the signal (adding about 50%), then makes up every second pixel, and then, for 3D sequences, makes up every second slice. It’s locking about 59% of the time off each sequences. And it’s really really good. I’m an MR tech.
  [-]
  - rasmus1610 16 hours ago
    but those are two different things. Of course something like Deep Resolve is great, as are modern model based reconstruction algorithms for CTs, but here we are talking about LLMs and their ability to interpret medical images, which has nothing to do with what you said.
  - microgpt 17 hours ago
    Sorry? You use AI to hallucinate medical images and that's good?
    [-]
    - uecker 17 hours ago
      It is not really the same as LLMs. I wouldn't call it AI. And I wouldn't say "makes up". I work in this field and this is certainly based also in part on my research.
      [-]
      - lostlogin 15 hours ago
        ‘Makes up’ is inaccurate for sure. But it’s not strictly true to call it acquired data either.
        After years of collecting artifacts and errors, I have more and more respect for the tool.
        But it’s jarring. I open a sequence, decrease the acquired resolution, add the AI and get a scan that’s quicker and higher resolution.
        It’s an amazing time to be an MR tech.
        [-]
        uecker 15 hours ago
        It is amazing. It is the result of two decades of research in image reconstruction algorithms. The machine learning is part of it, but that it is sold as "AI" has probably more to do with marketing.
        [-]
        fluidcruft 14 hours ago
        I haven't seen it marketed as "AI" by GE, Siemens or Philips. They usually gesture at "deep learning" or "compressed sensing".
        No radiologist is buying "AI" scanners. Radiologists are probably among the most jaded of an audience about the word "AI" due to decades of undelivered promises. AI is synonymous with "worthless trash" to them, not to mention everyone says "AI" is going to put them out of work. lol
        lostlogin 14 hours ago
        It certainly has a lot of marketing behind it.
        https://marketing.webassets.siemens-healthineers.com/2861d15...
      - microgpt 13 hours ago
        Super-resolution is certainly distinct from hallucinating - it just rearranged data that was already there to make it easier for the human eye to see - but should be used with care. I can easily imagine that an upscaling algorithm makes it so a certain defect is clearly not present, when the source image is ambiguous (which the radiologist would have noticed), and in reality the defect is present.
        [-]
        shiandow 5 hours ago
        I would definitely be wary using the more advanced super resolution schemes. It took some work preventing it from drawing faces everywhere.
        MRI is already a form of compressed sensing, I would much prefer statistical forms of super resolution to ones based on training data. Even if it is only trained on MRIs it will see some noise and plausibly expand it into whatever disease fits.
    - sota_pop 12 hours ago
      Most upscaling and super-resolution techniques I’ve seen use various implementations of interpolation; typically nearest-neighbor approaches. Although I don’t work in the medical field and haven’t checked in on the research at least since ViTs overtook CNNs for other areas of computer vision.
    - gavinray 17 hours ago
      It's just DLSS/Frame Generation for MRI's.
  - throwawayffffas 12 hours ago
    Sure but claude and ChatGPT are not Siemens 'Deep resolve'.
- pickleRick243 17 hours ago
  It's like people who expect ChatGPT to be really good at chess because chess engines with super-human performance have been around for decades, so obviously the latest frontier LLM that took billions to train should find the task trivial.
  Actually, I'm curious what ChatGPT 5.5's ELO is- I wouldn't be too surprised if it's 2000+ just from its basic understanding of chess principles from all the content it has digested.
  [-]
  - simonreiff 8 hours ago
    ChatGPT is completely unplayable at chess on its own. It's unable to keep track of the state of the chess position and therefore will make an illegal move within about 10-12 moves. I would put GPT-5.5's rating at 400, since it can't even make legal moves reliably.
    I've tried to pay chess with GPT-5.5, even played it again tonight, allowing it to use `python-chess` to keep track of the state of the position and to get a list of legal moves at each turn, so that it was fair. I also gave it blindfold odds, again to make it a fair fight, but it was not even close. GPT still isn't better than maybe 1000 Elo, maybe 1200 tops. Even with what amounts to being able to see the position and also being unable to make an illegal move, GPT-5.5 hangs material left and right, doesn't make a plan, and got smoked even when I gave it blindfold odds, to the point it's boring for me to play even under those conditions. I'm not sure it's better than whatever the GPT model was that was out about 8 months ago. I also thought it might be somewhat better than a beginner due to reading chess books, but no, it's complete garbage at playing chess, not even average-level skill.
    [-]
    - red75prime 5 hours ago
      That is no one has bothered to finetune or RLVR GPT-5.5 on chess games. Even open-llama-3B can be finetuned to around 1700 Elo[1].
      [1] https://arxiv.org/pdf/2501.17186
  - nicksergeant 15 hours ago
    Interestingly LLMs are extremely bad at chess position _images_. I have to imagine if you give it positions in text it'd be pretty great but when I was learning chess and pasting images of positions in for analysis I couldn't believe how wrong it was. I actually thought it was looking at the board in reverse but even when pointing out problems it seemed completely incapable of understanding what it was missing (of course... it doesn't really "understand" anything).
    LLMs truly are marvels with text but anything spatial seems to really mess it up, somehow.
    [-]
    - unholiness 13 hours ago
      > I have to imagine if you give it positions in text it'd be pretty great
      Not at all? LLMs are a terrible match for the kind of analysis a chess engine does (scaled deep search, deeply trained position evaluations). It's just not that kind of tool.
      [-]
      - nicksergeant 13 hours ago
        I suppose that's also a good point!
nostrebored 18 hours ago
I don’t understand the negative reactions. Medical care as it exists requires the doctor and patient to have their brains switched on. I’ve almost never had a problem where a doctor provides me with a diagnosis and I go about my day. Most of the times that I have, I’ve been confident about the problem and known what I needed. The doctor was a barrier to accessing care.
Dr. GPT is a good brainstorming tool. It helps synthesize information in a way that primary texts don’t. But it does force you to say “that doesn’t make sense”.
I do think that people saying “doctors don’t know the state of the art” have a weaker case. If you think about it in terms of token density during pretraining and how post training datasets are constructed, I think it would take us a very long time to adapt to any fundamental shifts. If we have forgotten how to cure scurvy, how many journal articles would it take before we adapt to a discovery?
[-]
- StefanBatory 5 hours ago
  > I do think that people saying “doctors don’t know the state of the art” have a weaker case.
  This is kinda the case though. In Poland I met only one psychiatrist that knew about DSM-5. In this year. DSM-5 was a thing from 2013.
  Doctors are people just as us, not every single of them is good.
  [-]
  - bonesss 3 hours ago
    Many DSM-5 diagnosis come into effect with the ICD-11, ICD-10 doesn't have a good deal of them, and that rollout is still fresh & ongoing.
    It is kinda spooky, though, to have freshly minted doctors from a few years back whose school-knowledge will forever be "outdated and archaic" based on standards published before they were in school.
    Some good advice I got: treat this as a generation shift, find younger and newer doctors who are familiar with the "modern" standards.
    [-]
    - peyton 3 hours ago
      APA guidelines come with expiration dates [1]. 10 years is the maximum.
      Churn is built in to the specialty. Read into that what you will.
      [1]: https://www.apa.org/practice/guidelines/criteria?item=5
  - roryirvine 3 hours ago
    Why would you expect a Polish psychiatrist to understand the differences between different versions of a diagnostic manual used only in the US?
misja111 4 hours ago
As someone who has had shoulder issues for the last 25 years or so, including partial tendon tears, I can tell you that even if your tendon would have been damaged, the treatment would have been strange. With moderately damaged tendons, you want:
1. stop any inflammation, by taking NSAIDs for a few days
2. detect and correct any behavioral patterns that could have caused the presumed overwear of the tendon
2. start physiotherapy to strengthen those muscles that can take over the load from the damaged tendon
These are not quick fixes, because quick fixes don't exist here. Stuff like shockwave treatment, massages etc will only lessen the problems for a few hours at most, after which they will come back.
jeswin 19 hours ago
I would not trust AI on images. But I once had ChatGPT tell me that an MRI report was very likely to be incorrect based on the text, and offered a different diagnosis. Since it was semi insisting, I visited another doctor who made me do a retest. Long story short, ChatGPT was correct.
Again, this is just one single person's experience. So not worth much.
[-]
- ferfumarma 9 minutes ago
  This sounds fascinating. Can you provide any detail regarding the nature of the diagnosis or problem it identified?
- nostrebored 18 hours ago
  I think that much of the visual gap is because what to attend to in images is less structured. Anecdotally small qwen finetunes (ie less than 10B) take task accuracy from sub 30% on FMs to 90%. We have sold some of these for outcome based back office tasks.
  I think we’ll see a lot of specialized VLMs that provide real value.
- energy123 15 hours ago
  Anecdote but I gave Gemini Pro an image of an individual with Herpes Zoster which the doctor said was something else. Gemini gave the correct diagnosis which allowed for correct treatment and cure.
  I don't understand why doctors don't prompt LLMs before saying wrong things. Is it ego?
  I can understand for radiology because you need a specialized convolutional network, but for more knowledge based things...
  [-]
  - alwa 13 hours ago
    “A man with a watch knows the time; a man with two watches is never sure.”
    I imagine reasons for what you’re asking might include:
    * Prompting an LLM is work, and they’re already overworked just doctoring—every conversation with a computer is a conversation you’re not having with a patient;
    * They’re probably right more often than they’re wrong;
    * “When you hear hooves, think horses, not zebras”: the 15th case today of strep throat is probably strep throat, regardless of today’s 15th falsely-confident LLM weighing-up;
    * They tend to have spent many many years honing a clinical intuition that makes an examination, to some degree, hard to articulate fully to the LLM;
    * Liability/overdiagnosis: All this stuff is probabilistic. Inevitably, there’s going to be a time when the LLM throws out something I thought unlikely that turns out to be right, and there will be other times when it’s wrong but now I have to document why. How many false leads do I need to chase per one true differential? Does this really compare favorably to seeking a second opinion from another human doctor?
    * Not everything needs to make it into the record. Once it’s in the LLM, it’s discoverable and litigable and hackable and permanent;
    * Medicine is practiced in very different ways in different contexts—even in this thread, one radiologist routinely orders ultrasounds for soft tissue shoulder problems, and the other medical-world person replying has never heard of such a thing—presumably both within US health care contexts. Some doctors hand out antibiotics like candy, others are more cautious with respect to resistance. What’s right can depend on the time, the place, the clinical setting—more than just the immediate patient-level facts at hand, in ways that become awkward or unwise to express explicitly.
    And of course… who’s to say they don’t do LLM-assisted research, in cases where they think it might be helpful?
  - fc417fc802 13 hours ago
    > I don't understand why doctors don't prompt LLMs before saying wrong things. Is it ego?
    Either that or laziness I'd imagine. This isn't limited to LLMs. Expert digital assistant systems that you query have existed for a long time. A good physician will double check anything even slightly unexpected against one.
- senectus1 11 hours ago
  mate the other day chatGPT (enterprise) told me that the kernel 7.0.2 was older than 6.69
  you cant trust these toys at all. that doesn't make the useless, just untrustworthy.
  [-]
  - HPsquared 3 hours ago
    6.69 hasn't been released yet, to be fair.
ricardobayes 19 hours ago
That might be doctors new nightmare: people who second guess everything with AI. Previously it was "google your symptoms".
[-]
- mettamage 19 hours ago
  Well I live in the nightmare that is the Dutch healthcare system [1]. There are many things that they will fix but they didn’t fix my sleep. A friend fixed my sleep. He is a doctor and prescribed me the right thing. The thing is, he shouldn’t have had to intervene. Without him I could have ended up poor and destitute as my sleep was wrecking me.
  And yea, I already did all the standard things. CBT for insomnia helped somewhat. My insurance didn’t fully cover it either, unless I was willing to wait for 8 to 12 months.
  And I recently met someone with slow moving metastatic cancer. Thanks to LLMs they will most likely live another 3 to 5 years extra since the Dutch conventional mainline treatment hasn’t been taken yet. But it is German doctors that helped them and Belgian doctors that pointed out in a second opinion that a lot more can be done.
  LLMs have a part to play. The false positives are awful, but I have seen an average of 5 out of 10 care when things become too complicated.
  Except for trauma treatment. The Dutch healthcare system is amazing once they diagnose classic PTSD.
  So it’s definitely not all bad but the trust I had when I was younger has been eroded quite a bit and LLMs can meaningfully step in, in my case at least.
  [1] I know there are worse systems. But from what I have heard there are clearly better systems nowadays. It has slipped a lot
  [-]
  - simianwords 19 hours ago
    Hey what did you do to fix your sleep? Help us all and maybe an llm will index your diagnosis (hi ChatGPT)
    [-]
    - mettamage 17 hours ago
      For me what helped is taking 7.5 mg of mirtazapine. At higher levels it's an anti-depressant but at lower levels it's an anti-histamine. It gets me drowsy. Together with 0.3 mg melatonin it knocks me out. I only take it 3 times per week max to not have habituation kick in.
      So 3 days out of 7 days I have guaranteed good sleep. The other 4 days are a toss up. But an average of 5 days of good sleep is much better than 3.5 days out of 7 days.
      [-]
      - randycupertino 10 hours ago
        Interesting. There recently was an article about how premenopausal and menopausal women are taking antihistamines with pepcid to help them sleep due to it going viral on tiktok.
        https://www.thecut.com/article/antihistamines-pepcid-ac-peri...
        > Then, a few months ago, Angela saw a social-media post from a woman who took daily anti-histamines (like Allegra, Claritin, or Zyrtec) plus Pepcid AC (a common antacid) for her perimenopause symptoms. Her results, as reported, sounded miraculous: no more brain fog, no more tossing and turning all night. Even her mood vastly improved.
        [-]
        masklinn 8 hours ago
        Technically Pepcid is also an antihistamine, but h2. And h1-antihistamines have been used for sedation for a long time tho the effect is very much ymmv (first one I tried OTC would knock me out but not let me rest at all, not a keeper).
      - Delk 10 hours ago
        AFAIK mirtazapine shouldn't cause habituation the way actual "sleeping pills" or benzodiazepines do. That's one of the reasons it may be preferable as a sleeping aid, especially in the longer term.
        Anecdotally, when I took mirtazapine for sleeping problems, it did sometimes seem to have a stronger effect the first time I took it after not using it for a while. After that the effect stayed stable. Overall it shouldn't cause habituation, and my doctor said as much.
        Of course trust your doctor and not strangers on the internet, though.
        [-]
        mettamage 4 hours ago
        > After that the effect stayed stable. Overall it shouldn't cause habituation, and my doctor said as much.
        Yea so this is where it gets murky for me. I experience some habituation actually. But my actual doctor went like "wtf is this?" and she didn't really mentioned what she knows about it. So on this particular pill my friend is my doctor. Not an ideal situation. I mean, he is an actual doctor but for him to be my doctor in this is a bit fucked up. He knows a lot more about mirtazapine than my GP though since he read up on it.
        [-]
        Delk 1 hour ago
        > I experience some habituation actually.
        I suppose it can be quite different for different people. I stopped using it because it often (not always) made me still feel tired and unfocused in the morning, something that apparently also doesn't happen to everyone.
        > But my actual doctor went like "wtf is this?"
        Different country, but where I live, prescribing low-dose mirtazapine for insomnia appears to be fairly common practice even though it's off-label. I've had it suggested or mentioned by three or four different doctors, including GPs. I also know several other people who have been prescribed it.
        The doctors here seem to prefer low-dose mirtazapine as safer over typical CNS depressants such as benzodiazepines for insomnia nowadays, at least if the problem may be longer-term.
        So it's not really something particularly weird. Of course different countries also have different medical cultures so I guess it's not surprising if it's not that common in other places.
      - masklinn 17 hours ago
        Is the dutch healthcare system broadly against hypnotics? Culture (of the country or its medical system) can massively influence prescriptions or their lack thereof e.g. france is pretty famous for prescribing hypnotics very easily (and having a broad range of them), while the UK is generally a lot more reluctant.
        [-]
        ricardobayes 5 hours ago
        True, for example in Switzerland you can't really get melatonin, even if you do it costs an absurd amount (like 100 chf). Doctors seem to be really against it in Switzerland.
        [-]
        mettamage 4 hours ago
        I find Dutch over the counter melatonin to be really good, just FYI [1].
        [1] https://www.kruidvat.nl/shiepz-melatonine-time-release-0-1mg... - Shiepz Melatonine Time Release 0,1mg Tabletten
        I personally take 0.3 mg, two hours before bed. I've done this for about 2 years now. It still works. I know, anecdata, but as you can tell the dose is low.
        dripdry45 14 hours ago
        yeah, I’m surprised Trazodone didn’t get mentioned as a very low dose
        [-]
        mettamage 4 hours ago
        Never got mentioned yea
    - greybox555 7 hours ago
      If you have sleep issue related to overthinking or racing mind, you may try fastsleep.app
      Instead of music, long podcasts you are given something to imagine at a time interval.
      Like if you hear "calm river", imagine that. If you hear "heavy rain over a tree", imagine that.
      In short → Close your eyes, listen & imagine.
      [-]
      - mettamage 2 hours ago
        This seems like self promotion with no contribution to Hacker News. You're an account that has 2 karma, and only exists for less than a month [1]. Also, it's a bit of a weak comment in general.
        [1] Account details when I wrote this down:
        user: greybox555 created: 27 days ago karma: 2
- js2 19 hours ago
  The NYT did this profile a while back: "Ben Riley was already writing about the risks of chatbots when his dad started trusting A.I. over his doctor."
  The dad was a retired neuroscientist who delayed cancer treatment against medical advice because he was certain he had been misdiagnosed based on his own research that he did with the help of A.I.
  https://www.nytimes.com/2026/04/13/well/ai-chatbots-cancer.h...
  There's a comment on the article from Ben Riley:
  > I am very grateful to Teddy Rosenbluth for sharing my father's story with the world, her kindness and curiousity proved to be restorative in ways I didn't anticipate.
  > The two words that everyone used to describe my dad: "intelligent" and "kind," and he was indeed both of those things. The sad irony here is that it was his human intelligence, combined with these strange new tools that purport to be a form of 'artificial' intelligence, that led to his ill-advised decision to forego the treatment he needed for his CLL. A doctor has already commented on this story with the observation that AI "confidently asserts erroneous conclusions," and we simply have no idea how often this is happening or the magnitude of the harm that results.
  > Not a day goes by that I don't feel the pang of my father's absence. He might still be here if not for AI. I try not to think about that, but sometimes I can't help myself.
  [-]
  - rvnx 18 hours ago
    The context is very important: decades of a poorly-diagnosed chronic illness had left him deeply distrustful of the medical system.
    This is the real root issue.
    At 75 years old, he was stubborn. Is that reasonable ? Yes, perfectly. Could he have been right since the beginning ? Certainly. Did he deny evidence ? Yes.
    Zero doubt that he was intelligent, everything points toward that direction, but that doesn't make a person less stubborn, because accepting the evidence, is also accepting that you were wrong if you initially postured yourself as adversarial instead of cooperative.
    He would have read Wikipedia, scientific papers, etc, even without AI.
    He did not want to be convinced. It works both ways:
    https://www.foxnews.com/health/woman-says-chatgpt-saved-her-...
    or
    https://www.today.com/health/mom-chatgpt-diagnosis-pain-rcna...
    Nonetheless, someone very smart, just didn't want to move from his position.
  - bensonperry 16 hours ago
    i mean, other smart people have famously delayed cancer treatment without needing poor guidance from LLMs! that's not at all new or unique to LLM chatbots
  - ieie3366 18 hours ago
    GPT-4o, which is what that article is most likely about, was an older low param count slop model which was known for abusing emojis and sycophancy. It does not really have any relevance to latest claude frontier models.
    Your comment is akin to saying "Karen from facebook who is a human pushed essential oils and ivermectin as a cure to cancer. Now doctor Y is suggesting chemo. Both are humans, humans cannot be trusted!"
- w10-1 15 hours ago
  It's not just the second-guessing. It's the getting in the ballpark but striking out: explaining in detail why they are not correct. A little bit of patient knowledge requires a tremendous amount of doctor time to explain away the ignorance.
  It's a 180 for me: While I believe doctors should explain diagnosis or treatment decisions when asked, I don't believe they should be taxed with explaining away alternatives. In my anecdotal 2nd- and 3rd-hand experience, doing that is taking at least a third of their time (on roughly 5% of the patients who think demanding answers will make things better) -- with zero improvement to diagnostic accuracy or treatment effectiveness. Doctors already consult with other doctors, and it makes no sense for them to have to consult with ignorant patients or treat their AI psychosis on top of their disease. It doesn't increase patient autonomy any more than adding a steering wheel for child car seats would help toddlers learn to drive.
  [-]
  - mindslight 5 hours ago
    Explaining diagnosis and treatment recommendations decisions inherently involves explaining away the alternatives. In this world where patients are ultimately responsible for our own care, explaining your rationale is a straightforward part of the job - otherwise there is nothing for patients to base their decisions on apart from how the options make them feel. If visits haven't been allotted enough time to get the job done, then that is something you need to take up with health plan bureaucrats rather than taking it out on patients.
- nosioptar 19 hours ago
  I asked a clanker about symptoms I was having. (I'm not an idiot, I was already on my way to hospital, clanker was just to take my mind off symptoms during the drive.)
  The clanker said I'd be fine, I just needed some rest and OTC meds.
  The medical staff immediately turfed me to surgery because the same set of symptoms I told the clanker were enough to concern them that I needed emergency surgery.
  Had I have listened to the clanker, I'd be dead because I did need emergency surgery. (Hell, I almost kicked the bucket because I waited for someone to wake up to give me a lift because.my insurance probably doesnt cover an ambulance ride.)
  [-]
  - throw310822 19 hours ago
    Very curious what made you run to the emergency first thing in the morning that an LLM understood as "just normal, take some OTC meds and wait".
    [-]
    - Aachen 16 hours ago
      Not OP but wanted to note that you're also likely to get different results based on the language you use (it'll respond differently to dialects of English, for example) and the RNG seed of the current session. These things are still probability engines and even if you know the exact symptoms, this might not be reproducible
- bilsbie 18 hours ago
  It’s funny every profession deals with customers making their own guesses at diagnosis.
  I told my mechanic the film flam is broken but he said it was the rim ram. He fixed it and we all went in with our lives.
  But doctors insist on this God like status so it’s a “nightmare” when patients try to help themselves.
  [-]
  - __MatrixMan__ 14 hours ago
    I dunno man, it's one thing to have your car still be broken because you were wrong, it's a different thing poison yourself on the basis of having done your own research. The mechanic can laugh at you, it hits a doctor differently.
  - nicman23 17 hours ago
    you are literally taking sleeping pills ..
- weatherlite 19 hours ago
  Nightmare because they're always right and the A.I second guessing is always wrong, or because they just don't like to be second guessed?
  [-]
  - b800h 15 hours ago
    Well it was a nightmare for my mother's do-nothing GP surgery in the UK. She had several conditions which were being handled completely separately without central coordination, and her health was in serious decline. We went in with a list of 20 AI-generated questions based on her conditions and treatment (which I was able to screen as I have a bio postgrad, but not medical training), including those related to NICE guidelines and procedure, and, frankly the GP bricked it and ordered a load of new interventions. My mother started to get proper treatment.
    I wouldn't trust AI to make a diagnosis, but I would absolutely trust it to notice where procedure hasn't been correctly followed, where a treatment is counter-indicated because someone has missed a line on a health record, or where there's a clear potential alternate diagnosis which has been missed for spurious reasons. Also, unfortunately, where doctors aren't doing a decent job - often because they're overworked or underfunded.
    [-]
    - ricardobayes 5 hours ago
      UK has probably the worst healthcare in the developed world. In part perhaps due to UK blindly accepting any kind of medical degree (doctors and nurses) from all over the globe. Yes, you heard that right, they verify the validity of the degree but there is no formal standardized exam to sit to practice in the UK.
  - tuvix 19 hours ago
    There’s more than two options here. It was already difficult to deal with self diagnosis for doctors, now we have a machine that outputs recommendations, and does it with confidence whether it’s correct or not.
    The same issues that were present with search-engine self diagnosis are still present with LLMs. If you provide Google with an incomplete list of symptoms and can’t interpret the information you find correctly, you will likely get an incorrect diagnosis. The same is true for LLM output.
    [-]
    - weatherlite 3 hours ago
      > It was already difficult to deal with self diagnosis for doctors
      I get it. But the current system is also super difficult for the patient: getting time to ask questions, get clear answers, get the best possible diagnosis taking into account your history, symptoms etc and all that in 5-10 minute checkup when your doctor sees 50 patients a day and has very little time for you; this doesn't scale well. Patients run to A.I for a reason.
    - weatherlite 11 hours ago
      The A.I is only gonna get better , and fast. Doctors should simply double check themselves by using A.I.
    - dheera 11 hours ago
      Everyone on the internet loves to put doctors on a pedestal, but I think upwards of 30% of my doctor visits have been misdiagnosed.
      There's a reason I ask AI about absolutely everything medical and there's a reason I keep extra quantities of prescription medications around for emergencies. I've saved my own ass a lot more times than the doctors have, thanks to good doctors not being available.
      [-]
      - ricardobayes 5 hours ago
        That works in your case because you likely already have an analytical mindset, can reliably discard or research information further. Same thing why AI is in an enabler for people having already some skill. But it can be very dangerous or misleading for people blindly believing all output.
    - rvnx 18 hours ago
      There are quite a few disclaimers everywhere that soften confidence: "always ask a medical specialist", "I'm not a doctor", "this could have been this or that but really not sure", etc.
      [-]
      - neonstatic 16 hours ago
        No one cares about this, especially those who believe the machine. It's just there for the provider to avoid responsibility.
  - vimda 19 hours ago
    Nightmare because users approach LLMs with the false confidence that they're always right, and present LLM outputs as fact to Doctors who have to waste time explaining that it's wrong most of the time. It hurts more than it helps.
  - mixologic 19 hours ago
    Its a nightmare because it erodes trust. Doctors are not "always right" which is why "always get a second opinion" is codified in culture.
    But AI's problem is that its completely full of shit, sometimes, and the people most qualified to evaluate whether its full of shit are the doctors, not the patients, but just like OP's original article, patients are left feeling like their second opinion from AI might be more trustworthy than their doctors opinion.
    [-]
    - weatherlite 10 minutes ago
      > But AI's problem is that its completely full of shit, sometimes
      It's now quite unusual that it's "Completely full of shit". If it contradicts something your doctor said I don't see why you should feel ashamed to bring it up. Sure it complicates the doctor's work, having ignorant obedient patients must be more comfortable for the doctor, but the end result could be more accurate diagnosis.
    - simianwords 18 hours ago
      The notion that only doctors can verify is false! Doctors are better at verification but normal people can also verify. This is just empirically true.
      Examples of things normal people can verify
      - procedural errors that Claude can capture like some blatantly high dosage (grams instead of milligrams)
      - outdated treatment plan, maybe there’s a credible new treatment plan that’s been used for years but the doctors were not updated
      - literally being injected homeopathic drugs (takes no smart person to flag this)
      Let’s stop talking as if doctors have a divine right here. And let’s accept some agency.
      [-]
      - drw85 1 hour ago
        But those are obvious errors. What if the AI tells you to up your intake of X and it seems plausible? So you up your intake of X by taking some supplement, but upping the intake of X makes your body deplete more of Y and now you have a new or compounding problem.
        A doctor might have never recommended upping X, because they would know what it does to your body. Or they might have suggested additional supplementation to avoid this.
        The fact that LLMs are trained on all public knowledge is a huge red flag, because there are more wrong infos out there than right ones. Especially about health, diet, etc.
        [-]
        simianwords 1 hour ago
        I was about to respond but your last statement implies fundamental misunderstanding of how LLMs work. I don’t think you even know about RLHF and you think good and bad ideas are spread in proportion to how much they are seen on the internet.
  - drw85 19 hours ago
    Nightmare because the AI is just generating a random text that fits the question.
    [-]
    - Legend2440 19 hours ago
      This is not a fair assessment of what AI is doing.
      Studies have found that newer reasoning AIs are about as good at diagnosing illness from a written description of symptoms as doctors are.
      Granted, it cannot actually examine a patient, so we're not replacing doctors anytime soon. But your view is obsolete.
      https://www.science.org/doi/10.1126/science.adz4433
      [-]
      - Retric 18 hours ago
        They are using the “gold standard for the evaluation of expert medical computing systems” not a proxy for what a doctor actually does when diagnosing someone.
        It may have some utility after diagnosis, but this test doesn’t demonstrate utility for patients.
      - snackerblues 18 hours ago
        [flagged]
        [-]
        microgpt 17 hours ago
        But I, SCP-426, am a toaster.
    - betaby 19 hours ago
      I feel the same when visiting a doctor in Canada. In that 2 minutes I have with they in one appointment per year I hear a standard text.
    - d1sxeyes 18 hours ago
      Not quite. An LLM generates text that would likely follow. The sky is… “blue”. A patient in pain with a bone protruding from their shin has a… “broken leg”.
      The more training data, the more questions it can answer with a reasonable degree of probability of accuracy.
      Throwing away a potentially useful analysis just because it’s probabilistic seems a bit like throwing the baby out with the bath water.
      [-]
      - drw85 1 hour ago
        But for obvious cases like this, you don't need a second or first opinion.
        This case is about handing a 3D imaging result to a text predictor and hoping for a valid second opinion.
    - poszlem 19 hours ago
      This is a very peculiar use of the word "random".
- gruntled-worker 19 hours ago
  This is obviously going to happen. But sub-par and sloppy doctors are a thing too. Medicine has been using semi-intelligent systems for years that were nevertheless found to improve outcomes.
  We need studies that quantify error rates from each source type, then we need to account for the fact that the artificial type will keep improving.
- ilovecake1984 19 hours ago
  Indeed. I don’t even get what OP thinks they are getting out of this other than doubt.
- consp 19 hours ago
  It can be helpful in your understanding the choices made by asking questions and thus in reassurance, but it requires something most people lack: understanding you are likely wrong since you are just collecting information without understanding it.
  Pretty much the like most manager these days, so I understand the frustration of the GPs.
- raincole 17 hours ago
  People should've googled their symptoms and especially the prescriptions they got. It has always been a good practice. If[0] AI proves to be the new google then people should ask AI too.
  [0]: IF.
  [-]
  - sarchertech 17 hours ago
    Do you know how many life threatening illnesses I’ve diagnosed myself with by googling symptoms?
- SeriousM 19 hours ago
  And say it's true because the AI said so.
- gib444 18 hours ago
  It's so much worse than some Google results: people see LLMs as a trusted friend who never talks back and never questions you, who is excellent at convincingly communicating their bs, reeling you in with "tell me more so I can really lock this down", continuing to fool you
  A con artist, a fraud
- rvnx 19 hours ago
  No, this flow is actually very good.
  Like any domain, when you have questions or need a solution, you make research first, then you ask a specialist.
  If you explain well the symptoms and context you can have proper advices and then decide on the path next:
```
    Case A) It looks benign and advices / information that you collected seem reasonable, then you go your way.

    Case B) You need second opinion of a specialist because the subject is too complex, or there are medications that you need approval.
```
  Once you have challenged LLMs, and read about the topics over and over then you genuinely become really good at understanding it (especially if you triangulate over LLMs and ask them to challenge, you start to have genuine questions). No matter if the answer is right or wrong, you have elements. Maybe you missed the point, but you come prepared.
  At home you have the time to assess the options, pros and cons of each approaches, the possible questions to ask and then challenge the doctor.
  Shared decision-making is an actual evidence-based model of care, and patients who arrive understanding their condition and carrying specific questions tend to get better attention and better outcomes.
  Some doctors get annoyed, because they have big ego and choose to be patronizing, but it is exactly their job to answer such questions.
```
    With LLMs, it's quite good, you get nuanced and rather useful answers.

    Before LLMs, no matter the topic you searched for, the answer was the same: "you have cancer / an [obviously deadly] rare disease"
```
  The other problem, in many places:
```
    • The doctors are not affordable
    • They are too busy for you (< 15 minutes)
    • You may need to wait months to get an appointment
    • They are not good (country-side is an example, and sometimes even country-level)
```
  + you can have all of these factors together.
  So, you have something deeply bothering you, your only appointment is in 4 months. It would be insane not to take the time to explore different solutions and not to come informed about the topic.
  If you express your prompt properly and do not rely on imagery, you can absolutely have top-tier advices.
  [-]
  - neonstatic 16 hours ago
    Agreed. This gets worse in cultures in which Doctors have no habit or haven't been trained that educating the patient is part of the job. Whenever I am back to my birth country, I specifically avoid doctors that are older than mid 30s, because they all have the same, terrible bed manner. They might be good at diagnosing and treating, but they never, ever explain anything, even when asked. Some even have "helpful pamphlets" to hand to the patient - anything to avoid explaining. It seems that in their view their job is not helping the patient, but completing a task - running a scan, performing a procedure, administering medicine etc. The human, that is subject of the task, is invisible.
niceworkbuddy 21 minutes ago
This is like someone not knowing software development at all doing a code review.
petercooper 1 hour ago
I'm pro-AI, but when I can't even give Claude a photo of my fingers clearly on four different piano keys and get an accurate result of which keys I'm pressing, I'm bearish on its visual analyses. (With that said, I imagine it's been trained on more photos of MRI scans than people playing piano, so I could see how it might provide interesting ideas for a radiologist to consider.)
stefs 2 hours ago
> My hope is that in a couple of model generations, we'll trust AI to review MRIs the way we trust it to proofread our emails.
the doctors may have financial motives to suggest unnecessary treatments. but i fear, by the time we'll have models powerful enough we'd be _theoretically_ able to trust their expertise, the financial motives will have shifted to the model creators.
[-]
- imdsm 2 hours ago
  [flagged]
rafterydj 19 hours ago
I feel like I'm going nuts.
There are other commenters saying this is a good practice they've also done for other injuries. You are saying you are an actual radiologist and immediately clock the problems with its advice.
I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading. It is only when you do not know what the AI is being asked to do is it likely you will find the output helpful.
This is itself alarming to me, but no one else seems to find this to be quite damning for the AI services being offered, preferring instanced to be wowed by the convenience and speed at which they can be delivered unreviewed and unproven information.
[-]
- dang 14 hours ago
  (We detached this subthread from https://news.ycombinator.com/item?id=48709121.)
- appplication 19 hours ago
  This is the root of AI psychosis. There’s a lot of unpack here, and I won’t go too deep because you can’t really have a discussion with affected folks because their fundamental basis is not evidence, it’s belief.
  It is weirdly religious in a way, because if you were to present contrary evidence (e.g. experts in a field weighing in about how plausible sounding responses are bunk), you would only be told you don’t believe enough in the long term potential and capabilities.
  Don’t get me wrong, I think we all agree capabilities will eventually improve (and farther-future capabilities could reasonably surpass experts), but really is unclear if the current transformer architectures with their probabilistic/hallucinatory outputs will plateau before they surpass current experts abilities in all promised fields.
  [-]
  - cheschire 17 hours ago
    I was a very early adopter in my circles with AI and I shared it with many people. Strangely, I seem to be the most skeptical about AI in my circles as well, but because I was the gateway for a many folks, they want to come back and share their experiences with me.
    And it's so much like listening to someone in a church congregation sharing their experiences with god. Clear and obvious gaps are hand-waved away exactly how you're describing.
  - operatingthetan 17 hours ago
    >This is the root of AI psychosis. There’s a lot of unpack here, and I won’t go too deep because you can’t really have a discussion with affected folks because their fundamental basis is not evidence, it’s belief. Treating it as if it is an intelligence is the problem.
    The problem is that AI psychosis is fundamentally the belief that an LLM is "thinking" at all. Outputs are just believable word vomit which resembles factual information.
    [-]
    - singpolyma3 17 hours ago
      That presumes that we have a definition of "thinking" or that we know that anything is "thinking" when in fact neither is true.
      The problem is real but I don't think positing a philosophical root is helpful
      [-]
      - operatingthetan 17 hours ago
        The claim that we are assigning human-like agency to a machine with none is simple and factual.
        [-]
        ForceBru 16 hours ago
        What's "thinking"? What's "agency"? What's "human-like agency"?
        If "agency" is making decisions and performing corresponding actions in the real world, then LLMs most definitely LOOK LIKE they're making decisions (what's the next token? which tool to use? what's to say, in general? what idea to convey?) and performing actions (tool use). Can we tell whether they are ACTUALLY making decisions? Well, are the people around me "actually" making decisions? Or are they simply pushed around by circumstances and external forces?
        Am I actually making decisions? Did I like DECIDE to write this comment? Maybe? I have no clue...
        [-]
        operatingthetan 16 hours ago
        I think you're mildly obfuscating the issues at hand by diving too deeply into philosophical questions.
        It's quite simple, the agency that the LLM appears to have is actually your own. Without a prompt an LLM does nothing. It has no thoughts between prompts about you or your problems.
        [-]
        ForceBru 15 hours ago
        Yes, I'm diving a bit too deeply because I don't really know what "thinking" is and therefore I don't understand how we can so confidently say that LLMs don't think, even though they definitely LOOK like they're thinking. They even have a "Thinking" section in their responses! If I say that a rock doesn't think, it's pretty convincing: does a rock look like it's thinking? No — it doesn't even do anything! But an LLM does look like it's thinking, at least while generating a response. When it's "offline" it's just a bunch of "dead" bytes, sure.
        So when it's not active, not responding to a prompt, it's of course not thinking. I'm pretty sure nobody actually questions this. Is your computer "thinking" when it's powered off? Can a piece of metal think? Probably not. So there are no thoughts between prompts, this seems obvious.
        Thus, this is a question of "discrete time vs continuous time". LLMs "live" from prompt to prompt. Humans are alive continuously. In some sense, we're prompted by a lot of things all the time. As I'm writing this, I'm seeing stuff, I'm hearing stuff, I can feel various parts of my body, I'm thinking about my problems, my goals, other people's problems and goals, etc. When I'm in a sensory deprivation tank, my brain keeps "entertaining" me by "self-prompting", like a recurrent neural network (I guess it literally is a massive RNN).
        So it seems like your definition of "thinking" hinges upon the LLMs being discrete-time and single-threaded (can't think about multiple things in parallel).
        IMO a more interesting question is whether an LLM is thinking WHILE IT'S GENERATING A RESPONSE, while it's "alive".
        [-]
        operatingthetan 15 hours ago
        I want to say I really appreciate that you are putting a lot of thought into this, you certainly have interesting concepts here. However I think it seems a bit far off from the discussion I'm trying to have, and I do not have the bandwidth to fully understand and charitably respond to your points.
        Shitty-kitty 15 hours ago
        We don't know what thinking is but pattern matching is definitely a big part of it. That's why people see Jesus on a piece of burnt toast.
        aspenmartin 15 hours ago
        You are implying definitions that don't seem to be mainstream; thinking is internally manipulating information to reason, infer, plan, solve problems, and form judgments or beliefs. Also -- "Without a prompt an LLM does nothing. It has no thoughts between prompts about you or your problems." it sounds like you paint this like it's something fundamental? It isn't. Nothing is stopping you from streaming information to an LLM and letting it process this information, this is precisely what people are trying to build.
        [-]
        operatingthetan 15 hours ago
        The machines have no driving force to act in the world. That is fundamental for humans.
        Twice in your comment you suggest things that you think that I believe, please do not do this.
        [-]
        aspenmartin 12 hours ago
        “It sounds like you believe” is a question, inviting your clarification. I will continue doing that because it’s perfectly reasonable. Also “machines have no driving force to act in the world” is also a mysterious statement but because you reacted so badly to anyone questioning you I will just leave it at that
        [-]
        operatingthetan 10 hours ago
        That is called a leading question and it is not "perfectly reasonable." Resisting your attempts at bad faith discussion is not "reacting badly." I agree though that we should cease discussion.
        singpolyma3 15 hours ago
        The idea that humans have agency is supernatural thinking imo
        [-]
        operatingthetan 15 hours ago
        A free will versus determinism argument doesn't really have a place here. Consider instead that humans factually have 'the illusion of agency.' The LLM does not even that have that. It cannot act on it's own, it has no ongoing drama or intention. It only reacts to prompts.
        keeda 14 hours ago
        Wait, where are we assigning human-like agency in this case? Agency to me means the ability to do something by itself. Here the LLM is not doing anything, it is just responding with information to queries from people, that those people may then act on. (Which you can say about Google searches too, yet we don't ascribe agency to Google.)
    - aetherson 16 hours ago
      You're confusing the training method with the internal process. If I had you repeatedly attempt to learn how to make believable completions of partial documents about a given topic, you would eventually learn things about that topic and could use your knowledge to create more believable completions of documents about that topic.
      [-]
      - operatingthetan 16 hours ago
        LLMs do not learn. You put it out to pasture and create a new one. "Memory" in a session is essentially a context window party trick.
        [-]
        aspenmartin 15 hours ago
        They do learn in context, and very sample efficiently. Continual learning is active area of research and we sort of already have something resembling it with persistent context. So yes they do learn.
        [-]
        operatingthetan 15 hours ago
        I consider that to be the illusion of learning. You are not wrong, I think they may actually learn in the future though. But not today.
        [-]
        aspenmartin 15 hours ago
        That’s strange to me, what would you define as learning?
        [-]
        FromTheFirstIn 15 hours ago
        To acquire new knowledge and build your understanding. They don’t understand so they can’t learn
        [-]
        operatingthetan 15 hours ago
        Thank you for saying succinctly what I could not. If your consciousness and knowledge fundamentally does not change from your ongoing experience, then you are not learning. This is how the LLM currently functions.
        [-]
        aspenmartin 13 hours ago
        You’re describing the problem of continual learning. As I said their “consciousness” for lack of a better term and knowledge does already change from ongoing experience in context which is another of saying for only a short window, today. They are ephemeral, sort of, but that’s a temporary limitation.
        [-]
        FromTheFirstIn 11 hours ago
        I think if your definition of consciousness can fit these things then you’re more open minded than I care to be. Consciousness isn’t really guessing the next thing to say- it’s hard to say what it is, obviously, but blindly feeling forwards with each new conversation doesn’t seem like consciousness or learning to me.
        [-]
        aspenmartin 10 hours ago
        We aren't talking about consciousness, we're talking about learning.
        > Consciousness isn’t really guessing the next thing to say-
        I don't know what consciousness is either and these debates are a dumpster fire when they happen, but it sounds like you're pulling forward this "LLMs are just predicting the next token" (true by construction) implies that they can't learn or reason or be conscious (2/3 are wrong, the last one isn't falsifiable without a useful definition).
        [-]
        FromTheFirstIn 7 hours ago
        Yes, I think we simply disagree. I think you know what LLMs are based on our other thread and if you think something important happens when you get a large enough context vector I don’t think there’s much I can say to change your mind. It seems unlikely to me!
        aspenmartin 13 hours ago
        “They don’t understand” is a strong statement, maybe true but depends on what you mean by understand. What is your definition of this? I can’t think of a meaningful definition of “understand” that doesn’t apply to LLMs
        [-]
        FromTheFirstIn 9 hours ago
        We’re in all the same threads lately!
        chiply314 15 hours ago
        They already learned. A lot or basically everything evern written and available digital.
        And context window work very well. You can 'teach' an llm a new programming lanuage and other things through it.
        aetherson 15 hours ago
        They learn during training, which is what we're talking about.
        [-]
        operatingthetan 15 hours ago
        >which is what we're talking about.
        You are anyway, I don't see anyone up the chain saying that.
        lemiffe 15 hours ago
        The LLM itself doesn't, but agents can research, compare, add to their memory, and use that to narrow the results down to a probabilistically higher set of outputs; I have used an LLM for my own MRI results and it was nearly spot-on, verified by a subsequent visit to a specialist. YMMV as they say. But I do believe we are entering the era where LLMs are considering past interactions and long context windows to inform it of personal preferences and history in order to output more accurate results.
      - goodpoint 15 hours ago
        believable != true
        [-]
        fhdkweig 15 hours ago
        This is what Stephen Colbert called "truthiness". People want to believe what they feel is true even if it is directly contradicted by evidence.
        https://en.wikipedia.org/wiki/Truthiness
        operatingthetan 15 hours ago
        A very important callout. It's the crux of the whole thing really. Humans are easily susceptible to deception by statements that are structured to be believable.
        aetherson 15 hours ago
        Sure. But that's not the subject.
        [-]
        operatingthetan 15 hours ago
        Please stop trying to police what the subject is to suit your own arguments.
    - corndoge 16 hours ago
      Often times the words produced do have legitimate factual information though. It's less psychosis and more a confluence of well known human tendencies - salience bias, automation bias, etc.
      [-]
      - operatingthetan 16 hours ago
        The big problem is often times they don't as well. That's why we can't rely on them.
        [-]
        aspenmartin 15 hours ago
        Same with humans? Doctors, scientists...if a tool has any error rate above zero its not reliable?
  - lazide 18 hours ago
    I don’t think they will improve, there is too much incentive to poison the datasets going forward.
    A lot of the models up to this point have been benefitted - like Google did - from essentially ‘pre SEO’ internet.
    Now the same tools are being used to generate nigh infinite good sounding bullshit, which poisons the dataset in all sorts of hard to detect ways.
    To add insult to injury, the human experts are also not as. Naive, and have many incentives to poison their own input in subtle ways too.
    [-]
    - brokencode 17 hours ago
      I seriously doubt that data set poisoning will be a real limiter in model performance.
      For one, if your website/book is poisoned, who is going to trust it for anything at all, much less for training models?
      For two, all the major AI labs hire or contract for subject matter experts to create curated data sets, evaluate model performance, etc.
      Unless they hire malicious experts, this will provide a growing, high quality data set that should drown out any poisoned pretraining data.
      [-]
      - chmod775 16 hours ago
        There's a post every other month where some dude who put nonsense information online celebrates because it actually ended up in some frontier models weights.
        If it's easy enough that some randos can do it for fun, what do you think happens when there's commercial interest behind it?
        Obviously companies are going try nudging AI towards recommending whatever they're selling. It's a logical extension of SEO - and that's a 100 billion USD industry.
        Additionally, if I believed myself to be in some sort of spending - err - AI race, I'd try to poison the data sets of my competitors by putting crap out there for others to ingest.
        [-]
        aspenmartin 16 hours ago
        It's not really a problem. We're out of natural tokens anyway. The future is synthetic verifiable traces (already the way we train coding agents).
        [-]
        maxnevermind 14 hours ago
        > synthetic verifiable traces
        What does it mean, Is it like when somebody used some coding agent to develop a feature and later input prompts and a resulting PR can be used for training by a presumption that final PR was a correct implementation of a prompt?
        [-]
        aspenmartin 13 hours ago
        Yea it’s rejection sampling, so you have an agent, you take a verifiable problem (people use lots of different verification signals but say unit tests etc) and have the agent attempt it K times. You accept the trajectories (all context, tool use etc, the entire log) that are positively verified and use these as training examples.
        The trick is to find the examples that are just in between too difficult and too easy for the existing agent, these have the strongest training signals
        jurgenaut23 15 hours ago
        Do you have examples of such celebrations?
        brokencode 14 hours ago
        There are so many better data sources that AI labs can use here that this argument really holds no water at all.
        Peer reviewed journals, textbooks, in-house teams of experts, trusted news publications, etc.
        The whole idea of scraping large swaths of the internet for training data has always been pretty dubious due to the variable data quality.
        I mean, just look at the early Google models that told people to put glue in their pizza due to a joke in the training set. Garbage in, garbage out.
        This is one of the first and most obvious problems all of these labs have run into, and countermeasures are only going to improve.
        [-]
        lazide 14 hours ago
        But they don’t, generally. Which is why it is a great argument, because it’s easy to falsify - and see it is what is actually happening.
        Also, those other sources are getting buried in AI slop too.
        [-]
        brokencode 13 hours ago
        The question is not whether it has happened or will continue to happen. Of course it will always be a problem to some extent.
        Your original claim is that this will be enough of a problem to prevent models from improving in expert level knowledge. I completely disagree with this premise.
        If the models fail to improve, it will likely be due to limitations in the transformer architecture rather than poisoned training data.
        And even then, I doubt that the transformer is the best architecture we will ever come up with.
        Clearly it doesn’t learn or think like a human does, since humans don’t need many gigabytes of text samples to learn to talk, so there is some room for improvement.
        [-]
        lazide 13 hours ago
        https://arstechnica.com/science/2025/01/its-remarkably-easy-...
        [-]
        brokencode 13 hours ago
        Great, an article about Llama 2 from early 2025. That doesn’t at all invalidate what I said.
        [-]
        lazide 12 hours ago
        While completely ignoring the fundamental reason. Whoosh.
        [-]
        brokencode 11 hours ago
        Not sure what point you’re trying to make.
        Shitty-kitty 14 hours ago
        They already are, It has become a real problem in Reddit. Especially with the latest in pseudo-science crap like peptides.
      - Analemma_ 17 hours ago
        I think you underestimate just how much money is being poured into LLM SEO at the moment. It's real quiet because they don't want to draw attention and countermeasures from the frontier labs, but this is getting huge investment, and they will have a monomaniac focus on juicing product results whereas the attention of the labs necessarily has to be spread out.
        [-]
        aspenmartin 16 hours ago
        Data curation is important and expensive and frontier labs can afford to do it right. Natural data isn't the limitation, we are already literally out of tokens. It doesn't matter how much you poison things it's not going to stop the progress train.
        tayo42 16 hours ago
        Who's doing llm seo right now? How does that work when you only gets feedback every few months when a new model is out?
        [-]
        natebc 16 hours ago
        I'm pretty sure the Optimization part is just ... not present at all.
        This is how we get LLM summaries presenting something mentioned once by some nutjob in a reddit thread as bona fide FACT
        DougN7 16 hours ago
        Look at G2.com - they found their website is highly references by AIs and they are leaning into it hard.
      - microgpt 17 hours ago
        Pretty easy to display one thing to verified browsers (just latest few user-agents from the 10ish different mainstream browsers on the 3 main OSes) and another to anything else.
        Yes AI scrapers can easily spoof user-agent, but they fall out of date as the browser updates.
        Bit harder to catch them in tarpits and then serve nonsense to whoever ever triggered the tarpit.
        [-]
        thfuran 17 hours ago
        >Yes AI scrapers can easily spoof user-agent, but they fall out of date as the browser updates.
        It’s a hell of a lot easier for a company to ensure that its scrapers all report the latest user agent string than it is to get everyone and their mother to update their browsers in a timely fashion.
        [-]
        microgpt 13 hours ago
        yeah but unless everyone is checking the version, if it's just a handful of websites checking it, they don't.
        and browsers forcibly auto-update
    - rvnx 18 hours ago
      Human doctors use LLMs to diagnose too
      OpenEvidence claims
      "More than 40% of U.S. physicians use it daily, and it handled around 20 million clinical consultations per month. Over 100 million Americans were treated by a doctor using it in 2025."
      https://www.cnbc.com/2026/01/21/openevidence-chatgpt-for-doc...
      [-]
      - something98 17 hours ago
        This is a very misleading statement; most of those physicians are using LLMs to transcribe notes from visits and/or for billing purposes (e.g., proper billing codes).
        [-]
        kjellsbells 16 hours ago
        The problems isnt LLMs per se, it is the shift to trusting the output of the machine coupled with a decline in verifying that the output is reasonable. It's basically what your teachers warned you about with wikipedia in eight grade except applied to all areas of life, including medicine. Dictation is already high-stakes and LLMs do not automatically reduce that risk.
        Here is an example. My provider sent me this note. I'm quoting verbatim here from my MyChart record:
        "Your liver enzymes are high, I would like to order acetaminophen containing medication like Tylenol, I would like to order liver ultrasound I placed ultrasound order in the system, make an appointment for radiology, I would like you to get hepatitis panel lab work done, obtain blood work order, please schedule a well visit to get it done"
        When I queried it, this is what I got back. It was a dictation error. You could almost hear the panic in the message:
        "Sorry for wrong message earlier, I was dictated message- so could not realize that it was written to take Tylenol type of medicines- I DO NOT RECOMMEND ACETAMINOPHEN CONTAINING MEDICINE - LIKE TYLENOL AND ALCOHOL DUE TO ELEVATED LIVER ENZYMES."
        Again the problem is not dictation, or LLMs. The problem is humans ignoring their responsibility to check the output of a machine.
        [-]
        ethbr1 15 hours ago
        > Again the problem is not dictation, or LLMs. The problem is humans ignoring their responsibility to check the output of a machine.
        100%. Also, management.
        I wish someone would go ahead and coin an AI version of Amdahl's law that states the work speedup from AI is dependent on amount of unverified AI output used.
        Iow, if you 1:1 verified everything, there would be no time savings.
        Ergo, you get management saying (1) we demand time savings due to AI & (2) we demand you fully check anything you use AI for.
        End result? People skip (2) to hit (1).
        Then management burns anyone at the stake whenever inevitable mistakes happen.
        [-]
        lazyasciiart 15 hours ago
        But that’s trivially false. There is an entire category of work where it is hard to come up with an answer and easy to verify the answer, which means that if you verified everything there would still be a large time savings.
        [-]
        ethbr1 14 hours ago
        I would question whether that holds in the practical LLM automation space.
        Can you think of any real life examples where an LLM is likely to be used?
        I think in practice what you're saying is there are problems where there exist efficient deterministic verification methods, and I'm sure that's true.
        But that's not the bulk of everyday work LLMs are being asked to do nowadays across industry.
        girvo 15 hours ago
        Which is itself a problem as (in my partners evaluations as an optometrist), LLMs used for clinical notes has a bad habit of dropping clinically important information, and the biggest providers don’t give you a copy of the raw transcript or a recording
        Which means she ends up spending just as much time as if she’d done it herself as it needs to be verified for accuracy every time…
        brokencode 17 hours ago
        OpenEvidence is specifically meant to help clinicians make evidence-based decisions in the diagnosis and treatment of patients, not note transcription.
        [-]
        sxg 17 hours ago
        It does both: https://www.openevidence.com/user-guide/visits-overview
      - sarchertech 17 hours ago
        Ignoring the fact that this number comes from a company press release, it doesn’t say anything about the number of doctors using it to diagnose, just that they use it.
        If a physician uses Google to search for a dosage chart for some drug they rarely prescribe, you wouldn’t say they are using Google to diagnose the patient. You wouldn’t say that either if they used Google to search for the most recent studies on a topic.
      - sambellll 17 hours ago
        To me this is like a good software engineer using AI.
        The fact that they use it doesn't make what the result is any worse or less trustworthy - arguably it makes it better.
        It only becomes a problem if they offload all of the thinking to AI.
  - sublinear 17 hours ago
    Human expertise is also improving all the time and not limited to just connecting dots. When AI seems to surpass a particular human, it's just because the human lacks broader knowledge and fails to investigate further.
    An expert already knows they don't know everything. That was never the point. Critical thinking cannot be delegated to AI any more than it can be delegated to a book. There is nothing new going on here.
  - perching_aix 15 hours ago
    > There’s a lot of unpack here, and I won’t go too deep because you can’t really have a discussion with affected folks
    Do you think it is any more possible to have a proper discussion with someone who preemptively paints the other person as mentally ill? Or someone who preemptively victimizes themselves?
    Cause I don't think these are the hallmarks of an honest discussion. See also the entire past decade of political discourse.
    Like, consider this:
    > It is weirdly religious in a way, because if you were to present contrary evidence (e.g. experts in a field weighing in about how plausible sounding responses are bunk), you would only be told you don’t believe enough in the long term potential and capabilities.
    A trivial counter to this is that you can just be an expert at something (e.g. your own work), use the damn thing yourself (professionally), and evaluate the outcomes for yourself. Then maybe remark "LLM good".
    Now you come and remark "LLM bad", and point at random "evidence", either of outright other workloads, or even the one at hand: you're asking someone to reject the reality they've already experienced, entirely based on the assumption that they're "merely religious" or "in psychosis". You tell me if that's any more epistemically rigorous and sensible than their story.
  - TomasBM 17 hours ago
    Why is it psychosis and not lower standards?
    While I can understand being skeptical of non-experts' claims that such answers are enough, I don't understand why you call it "psychosis" and not simply naivety or lack of expertise.
    At the same time, the new so-called "models" haven't been pure transformer-based LLMs, but entire systems with tools (with access to the Internet), data storage, and the options to trigger additional instances for different tasks.
    [-]
    - janmatejka 17 hours ago
      Because some people develop actual psychosis. They go down some rabbit hole with an LLM until the LLM makes them believe they invented new kind of physics that makes them go harassing experts who obviously try to ignore them because its all nonsense.
      [-]
      - ruszki 16 hours ago
        For me, what others said and literally showed with Claude Code, et al, and what I’ve been experiencing with it, clearly signal way lower standards. But this was true even before LLMs.
      - shimman 16 hours ago
        Reminds me of that clip of Travis Kalanick, sexual deviant and harasser of women, talking about "discovering new physics."
        [-]
        natebc 16 hours ago
        The Uber guy? Yeah that was a painful watch.
      - perching_aix 15 hours ago
        Graciously diagnosed for them by random unqualified people on the internet with an agenda, frequently before even any relevant interaction:
        "Oh you like LLMs? You must in AI psychosis!"
        Let's not pretend it is anything more than the run of the mill wet fart of a culture war label. It's quite literally the "TDS" of the anti-AI crowd.
        [-]
        doawoo 15 hours ago
        That's really not the argument being made here, and you're panning it further by claiming this is staunchly anti-LLM.
        The idea here is to signal that you can absolutely use LLMs to help you figure something out. But also, they're wrong a lot. So use your own brain too.
- qnleigh 18 hours ago
  Totally agree. I'm a scientist, and like most scientists I have some specialized skills that most of my colleages don't. AI has empowered them to learn and build things that they might have otherwise needed me for. But there have been quite a few cases where it led them very far down a wrong path. This has started happening way more often in the last few months.*
  We've known since the beginning that AIs confidently say incorrect things. But now that they can speak confidently about very complex topics, and mostly say correct things, we are letting our guard down and lots of subtle falsehoods are slipping through.
  *In one case, I was able to put things back on track because the AI suggested my colleague talk to me; somehow it figured out we were co-workers.
  [-]
  - aspenmartin 15 hours ago
    Right but hallucination rates have been consistently decreasing every model iteration. It's about error rates. As also a fellow scientist, I also will mess something up. Humans have an error rate. Once that error rate is low enough, it doesn't matter that it's > 0, it matters that it's low enough to be trustworthy and useful. Coding agents of 2024-25 had error rates too large; you couldn't meaningfully vibe code anything and needed a ton of oversight. It's still true but FAR less so, and this is after like a year of iteration.
  - bitlad 17 hours ago
    >very far down the wrong path.
    Absolutely agree. Have seen this first hand
- sxg 19 hours ago
  I see your argument, but it's not exactly news that an expert found a flaw in a popular tool. You could say the same about Wikipedia--experts have tons of issues with it, but Wikipedia still provides value to non-experts. The most likely alternative to Wikipedia for non-experts is simply not trying to learn anything new.
  Similarly with LLMs, you can't just write them off entirely because they sometimes provide misleading or incorrect advice. The positive utility maximizing view is to learn when you need to call in an expert. I recently moved in to a new house and have used Claude extensively to figure out basic things (e.g., adjusting the garage door height, how to mount a TV). However, when the HVAC suddenly stopped working, I gave Claude a shot for an hour and tried some non-destructive fixes, but then realized I had to call in an HVAC expert.
  [-]
  - ohyes 18 hours ago
    The free alternative to Wikipedia is the library, not “don’t learn anything new ever”.
    I find Claude is surprisingly similar to a confident but incorrect coworker, with the benefit that Claude will reevaluate when I correct it.
    [-]
    - sxg 18 hours ago
      I used the phrase "most likely alternative" intentionally. The library is where people should go to get answers in a world without Wikipedia, but the vast majority of people won't. So in practice, most non-experts either learn from Wikipedia or don't try to learn anything at all.
      [-]
      - ohyes 17 hours ago
        Sure, if we’re going to go that broad. People are already leaning heavily towards learning nothing instead of using Wikipedia.
        I guess to me it has to be comparable to be an alternative.
        Like, I don’t consider doomscrolling x an alternative to reading Wikipedia but I might consider it an alternative to CNN, even though they’re all technically and very broadly activities that I could use to inform myself.
        In that same way I don’t consider the multitude of ways I could use my free will necessarily alternatives to each other even though they technically are. It kinda sucks but going that broad feels to me like it breaks the concept of alternative and makes it kind of meaningless.
        [-]
        sxg 17 hours ago
        I get what you're saying, but I'm not deciding what should and shouldn't count as an alternative to X. I'm trying to answer the counterfactual: how do people behave in an alternative world without Wikipedia but otherwise identical to our world?
        [-]
        ohyes 5 hours ago
        I mean, technically… I’ve lived in that world. You just go back 25 years. We had the Dewey decimal system (card catalog) and the library, hard copy encyclopedias. You could also ask someone else.
        Then we had computerized encyclopedias and search engines that searched the library.
        I mean, you had to work for the knowledge. Sometimes you didn’t know something and no one else knew either, so you had to wait until you got a chance to find out, but you would think about it and sometimes you would be right when you found a reference source.
        I’ll also note, Wikipedia is a secondary source. It is not a reliable source of truth. It is more like the ‘ask someone else’ alternative than anything else, it’s just ‘someone else’ is a person on the internet who writes Wikipedia articles.
    - bflesch 18 hours ago
      Claude will do everything to retain you as a user, because that's one of their most important metrics.
      [-]
      - ohyes 17 hours ago
        Excellent point my colleague has the exact opposite incentive.
  - frereubu 18 hours ago
    Slightly OT Nitpick: in regard to experts and Wikipedia, when doing a neuroscience-adjacent MSc, experts in the field actually directed me to Wikipedia as an excellent source for high-level neuroanatomy, including recent research, so I'm not sure your blanket description about experts and Wikipedia is correct.
  - Applejinx 15 hours ago
    You 100% can write them off entirely and go about your business as you previously had done. Ignoring the errors, it is very debatable whether there are even productivity gains beyond: human programmer or whatever is excited and cranked up to unsustainable degrees of activity and thinking to 'keep up' with what he thinks is an AI doing the work.
    I'm seeing this fairly often and when it isn't garbage it's a capable person who has gotten inspired by their 'collaboration' in which the busywork is being done by a machine, but they're doing so much directing and correcting that it's not unlike what would happen if they got heavy into meth and went on a tear.
    You absolutely can write them off entirely and decide for yourself what your comfort level of human-killing speed-freakism you want to pursue in your productivity. There's a long history of humans managing astonishing levels of productivity through self-destructive means. This is not even cheaper, once the 'first one's free' wears off: it's just a novel method of getting humans to burn themselves harder in the belief that they have a magic feather.
    The ones who're really throwing themselves into the situation are the ones who'll burn out, but who aren't setting themselves up for atrophy and learned helplessness. Anyone who believes the technology lets them be a lazy manager just getting paid, is in for an unpleasant discovery.
- sbarre 19 hours ago
  > Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading
  Yes, this is exactly so. AI is able to confidently sound plausible enough to convince laypersons or anyone who isn't very familiar with the subject matter, which is a big part of the mass-appeal "magic" of ChatGPT and other similar tools. It's like having a know-it-all friend (who also makes shit up to bridge their own knowledge gaps).
  In many non-advanced non-specialized situations, AI is right enough to be at best useful or at worst not harmful (usually landing in the middle somewhere).
  But speaking for myself, in areas where I consider myself quite proficient, I can very easily spot the subtle inconsistencies and naive conclusions that AI responses provide, and I have to guide/steer/correct it a lot to get good results when the subject matter is complex enough.
- meowface 18 hours ago
  I may be missing something, but I think it's unclear that the parent poster here is necessarily actually contradicting anything the AI said. It may depend on the exact information the OP wrote to Claude and GPT. The full transcripts would be needed. (Though there is definitely a separate point that a doctor would generally better know all the right questions to ask, while current LLMs may be making certain assumptions.)
  The LLM may have, from its "perspective", implicitly thought the OP was telling it that he had strong reason to believe there was no calcification and was not considering the bigger picture of possibly receiving an incomplete/poor assessment from the medical staff. In fact, the issue here may be the LLM overly trusting doctors vs. trusting its own expertise.
- david-gpu 15 hours ago
  Last week I went to a highly-specialized tertiary clinic about further treatment for a rare medical condition that I was diagnosed and treated for as a child. The two very specialized doctors I met there confirmed a diagnostic mistake that a specialist had made ten years ago. The only reason I pursued a second opinion, ten years later, was because Google Gemini had explained to me that the specialist ten years ago had performed the wrong type of test for my condition.
  Do these LLMs make mistakes? They sure do, I see it all the time. But they can also help people make breakthroughs.
  And this isn't the only time that Gemini has helped me diagnose long-term health issues, either.
  I am not advocating to trust anything they say blindly, but they can be a great place to form new hypotheses and learn the right terms to look for when you are unfamiliar with a subject.
  [-]
  - wasabi991011 15 hours ago
    Can you elaborate on how you use Gemini to diagnose long term health issues? Considering doing the same for myself, but I have no idea what is too much vs too little information, and generally the type of prompt engineering to do.
    [-]
    - david-gpu 11 hours ago
      Some folks are not going to like what I am about to say, but what I do is write down as much information that I think may be relevant as possible, trying to avoid leading the witness with any of my preconceived ideas of what may be going on. At the end, I encourage them to ask me questions to get a more complete picture of what may be going on.
      After a couple of rounds of that, a picture will start to emerge. The AI will make a few XYZ hypotheses of what may be going on, some of which will make more sense to you than others. This is when you can start searching some of those terms in places like pubmed.ncbi.nlm.nih.gov, including for example like diagnostic criteria for XYZ.
      One of the ways I often use these AIs, not just in the context of finding possible diagnoses, is requesting them to make the case for and against hypothesis XYZ based on the data you have personally collected. Again, it's not about fully buying every thing that comes out of them, but it can help you consider angles or possibilities that did not occur to you, or that you had previously accepted/discarded without sufficient evidence. Think of them as that quirky acquaintance that knows a little bit about everything but sometimes misremembers, rather than as a god-like oracle.
      And don't do all this in a single session/context. Start a new context every now and then, because otherwise it tends to go in circles as these AIs are biased towards agreeing with whatever it is you said most recently. Intentionally challenge yourself, re-evaluate the existing data from other perspectives.
      Sometimes what you learn is not pleasant, but as more data becomes available, you learn to accept it. Good luck.
- nlawalker 19 hours ago
  > no one else seems to find this to be quite damning for the AI services being offered, preferring instanced to be wowed by the convenience and speed at which they can be delivered unreviewed and unproven information
  "Be wowed by the convenience and speed", or merely "take advantage of the mere availability"? What most people find to be damning about expert advice is that they simply can't get it anywhere, at any cost that they can afford.
  [-]
  - whatever1 19 hours ago
    So if you want to do a surgery but you don’t see any surgeons around you ask a grocery butcher to have his way?
    [-]
    - sxg 18 hours ago
      In certain circumstances, the answer is yes. If an airplane's pilots are incapacitated, do you simply give up and crash the plane because there are no other pilots on board? Or would you rather have someone on the ground try to coach a passenger into at least attempting to land the plane?
      [-]
      - ChrisMarshallNY 18 hours ago
        As long as that passenger didn’t have the fish.
        [-]
        acheron 13 hours ago
        Yes, I remember, I had lasagna.
      - frereubu 18 hours ago
        That's an extreme edge case, which I don't think is in the context of the concerns in this thread.
        [-]
        sxg 17 hours ago
        The specific case doesn't matter--it's meant to make you think about the general question throughout this thread: when an expert isn't available, should non-experts use AI (or other tools) to help themselves? Sometimes the answer is yes because the potential benefits outweigh the potential harms (if any harms exist). But sometimes the answer is no because misleading/incorrect advice can cause a net harm.
        [-]
        frereubu 17 hours ago
        But if the cases where AI use is a net positive are one in a million in medical situations? The argument is surely about the ratio, which many people here are arguing (from anecdote, would be interested to see a real study) is not in its favour, and the potential downsides - from both false positives and negatives - can be huge.
      - close04 18 hours ago
        A passenger crashing the plane while trying to avoid a certain crash doesn’t make things any worse. An incompetent doctor trying to save you from certain death can make things so much worse. It’s all about weighing the best/worst outcome compared to where you are now.
        [-]
        microgpt 17 hours ago
        I hate to break it to you but death is certain for everyone.
        Properly emotionally processing this fact and your complete inability to do anything about it is called an "existential crisis" and if you haven't had one or several yet, you will.
        [-]
        close04 17 hours ago
        I’m not sure what the “revelation” is? How is this related to what I said?
        Putting that aside, your philosophy sounds shallow. Death is certain, but how long you have to live and the quality of that life are not predefined. An incompetent passenger-pilot trying to save you from a crash will at worst make no difference. But an incompetent doctor can teach you that death isn’t necessarily the worst outcome.
        [-]
        microgpt 13 hours ago
        There are many healthy psychological ways to accept the certainty of eventual death. But the process is inevitably painful.
        I think the different ways people accept death explains a lot of people's psychology, like how you can guess people's attachment styles or Freudian stage fixation. For instance, billionaires who pour all their money into anti-aging research clearly are not handling it well.
      - jancsika 18 hours ago
        You can choose a) a calm, level-headed passenger who knows they aren't a pilot, or b) a calm, level-headed passenger who almost has their pilots license but has a medical condition that prevents them from admitting when they lack certain knowledge.
        Who do you choose to be coached by an expert on the ground?
        [-]
        rvnx 18 hours ago
        No thank you, I will ask Claude and then ask ChatGPT to challenge me, and do a couple of rounds like that.
        The first: Has no clue about anything and therefore no useful knowledge and cannot challenge me
        The second one: Is proven to willfully give wrong information and will make me do mistakes for sure.
        The LLMs will do their best, even if imperfect, since they summarizes what appeared in books.
        I prefer to be grounded on what Airbus / Boeing manuals, or on what pilots training book said, than two far more unreliable sources.
    - EA-3167 18 hours ago
      People, especially in medical crises, are desperate for answers that they often can't get because their clinicians don't know. The illusion of an all-knowing guru who sounds like their doctor and tells them ANYTHING is extremely alluring. If you're waiting to hear back from a doctor about test results (which these days probably showed up on your online account the moment they were completed) can be agonizing.
      Ok for pain in your shoulder it might not, but how about a woman with a lump in her breast waiting for the mammogram interpretation? How about someone trying to understand disturbing lab results? People are also often pushed these days to move through visits with doctors at a breakneck speed, but the AI will "hear you out" all day.
      Part of this is a problem with the AI, part of it a problem with our healthcare systems, and part of it is simply human nature. If you think that OpenAI, Anthropic, Google and the rest weren't aware of this going in you must have very little faith in the intelligence of their members. It's not hard to imagine the future of LLM's should involve a hell of a lot of liability on the companies running it, but for now it's the Wild West.
      [-]
      - bilsbie 18 hours ago
        > but how about
        Whatever scenario you come up with my answer is the same.
        As an adult I’d like to be able to choose what tools I use to learn about my condition regardless of how well it works or even if it’s likely to mislead me.
        There’s risk in every aspect of life and we can’t baby proof everything.
        [-]
        baconmania 18 hours ago
        >choose what tools I use to learn about my condition regardless of how well it works or even if it’s likely to mislead me.
        Even if it "works" so poorly that you're not actually learning about your condition?
        EA-3167 18 hours ago
        If it's helping you learn about your condition then sure I agree. The issue here is that's not really the case, it's giving you the illusion that you're learning about your condition while feeding you hallucinations and half-truths at best. A recent look at medical advice from these things showed they're no better than a coin flip.
        So if you MUST have answers that are at most random guesses, I'd suggest saving a few bucks and asking a coin before flipping it.
      - perching_aix 16 hours ago
        The companies are 100% aware, yes, and so they did make quite a few changes over the years.
        Current trend is that the models will try to explicitly steer you towards "asking better questions from your medical provider", rather than providing diagnoses. They do also evaluate whether something can actually be established rather than just listen and nod along. And so the "you must have very little faith in the intelligence of their members" goes right back against these failure mode ideas.
        Now of course, given a sufficiently desperate person, they can probably torture anything they want to hear out of these models. But so can they out of actual people, so that's kind of a high bar. When you get to the point where people are willfully misreading a given piece of text, bets tend to be rather off.
    - perching_aix 16 hours ago
      No, people don't even go to a butcher, they do it themselves if they can. See the countless stories about farmers and their inventiveness. Example: https://www.youtube.com/watch?v=KKaJhQBusH8
- highfrequency 19 hours ago
  Seems natural enough. There will always be complexity and nuance that is missed by an AI model or person - the world is just super detailed. The more expertise you have the more you will be aware of that nuance. That doesn't mean the model or person is not useful as a starting point.
- scosman 16 hours ago
  I dunno. I know a lot of software engineering experts. AI isn't always right, but neither are the people, and it's getting better and better.
  Software is one domain where it excels because of structured training data and simulation environments, so I'm well aware it's better here than other areas.
  Still there's somewhere balanced between saying every time it's "insufficient or incomplete or outright misleading" and "just trust AI". AI's a useful source of information/reasoning/research, but know you need to validate it's answers for important decisions.
- Aurornis 17 hours ago
  > I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading. It is only when you do not know what the AI is being asked to do is it likely you will find the output helpful.
  I always recommend people try asking LLMs a lot of questions on something they know first. Programmers should start by asking LLMs to work on a codebase they’re familiar with first.
  You’re overstating the problem, though. Even for an expert the LLM will get a lot of things right and can be helpful under a watchful eye.
  The real problem is knowing how to identify when it’s on the right track and when you need to correct it, because both cases are presented with the same tone and confidence.
  An expert can better identify when the LLM output doesn’t sound plausible. Someone unfamiliar with the topic will think everything it says looks correct.
- kryogen1c 18 hours ago
  On the flip side of this problem, novel best practices lag the medical standard of care, other human failures like corruption and competing priorities notwithstanding.
  For example, we had to advocate for certain practices during the birth of our first child that became routine during our second several years later.
  So, neither side is guaranteed correct, doctor or citizen researcher (which did not include LLMs in my case, for the record). The truest answer is also the most useless one, applicable to all fields: it depends.
  The real question is: if you embrace being a layman, whom do you trust more: LLMs/the internet or experts, like doctors? I think the answer is pretty clearly experts.
- rapatel0 17 hours ago
  You shouldn’t expect frontier models to work on medical imaging. There is much more that goes into building a medical imaging product. First and foremost is data. Medical imaging datasets are not prevalent one the public internet at the scale necessary to have good performance on medical imaging tasks especially MRI. Also the labels are super noisy.
  This is completely different than asking for general medical reasoning which is more derived from papers, public standards and textbooks.
  Text exists at the right scale but images don’t.
- je42 18 hours ago
  The question is how far is AI off compared to the professional that we have access to. World best experts are not accessible to most of us. :(
- mattgreenrocks 17 hours ago
  You're not. This site was also bullish on using LLMs as therapists, which defeats the very point of them, and reflects a lack of knowledge on what exactly therapists do for people.
  More on topic: if the article's author arrived at a definitively negative result would this have shown up on HN?
- jstummbillig 18 hours ago
  No, not anytime someone is an actual expert at anything, AI output appears insufficient. That is why experts in various fields use AI.
  Then to say "Aha, but all of that is AI psychosis" makes obviously no sense: Why would we trust experts when they offer critique but not when they say "this is helpful"?
  Overall: People are not insane. AI makes mistakes and, often, fails completely. AI also helps them do things better, quicker, increasingly so. The jaggedness of AI is confusing and real.
  [-]
  - torben-friis 18 hours ago
    How many times have you seen an expert go "yeah these results are good consistently enough for a non expert to trust them without expert assistance"?
    There is a huge difference between having a chance of a good result, which can be useful for experts able to filter out the bullshit, and consistent success. I would generate code as a helper, I would never allow a guy from marketing to merge unreviewed AI code.
    [-]
    - jstummbillig 15 hours ago
      > How many times have you seen an expert go "yeah these results are good consistently enough for a non expert to trust them without expert assistance"?
      But see now we are talking about something else entirely than the claim that I found dubious, which was: "Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading."
      Consistently good enough !== anytime insufficient
    - hectdev 18 hours ago
      That's what I would like to call job security. When you know how to read what is wrong, you can easily catch the mistakes and correct it. AI gets you there faster by doing a lot of things right and you correct the mistakes.
    - tpmoney 17 hours ago
      I had a realization recently that the problem with "AI isn't consistently good enough" is that experience is probably not sufficiently distinguishable from the experience most non-experts have with computer systems all the time.
      As an industry we've been promising people for decades that if they put all their data into our special softwares they can get all sorts of information back out that will make life easier for them, reveal new insights and otherwise improve their understanding. But the unspoken caveat has always been that you have to put the right data into the right places, in the right format, in the right way and then you have to ask the right questions, in the right syntax, with the right tools. And if you get any one of those parts wrong, you're not going to get the right answers (or possibly even any answer at all). How many people have had their excel worksheet that they (or someone else they asked/employed) built for some task that has been working fine for the last year suddenly stop working or start throwing out nonsense numbers because some input changed? Or how many people have experienced their system seemingly throw out meaningless garbage because daylight savings changed right at the moment the report was being run? Or spent months operating on wrong data because the person who wrote the query misplaced a parenthesis and the query was searching for "(foo AND bar) OR baz" and not "foo AND (bar OR baz)". For most people, the computer and the programs they use to do their jobs are magical black boxes that most of the time produce mostly the right answers and sometimes get things very very wrong with no indication of what has changed. Which is effectively the same experience they will have with an AI, but now instead of needing to figure out some arcane excel pivot table and VBA script, they can just dump some raw data and a "natural language" question into the AI.
      And that's not counting the fact that their experience with looking information up online is about the same as well. How many absolutely confident wrong takes have you encountered online for things you're an expert in? How many of those wrong takes have come straight from supposedly trustworthy sources like news companies or even other people in the field?
      For most people, using a computer has always come with the asterisk that you should always be aware that the source you're reading could be very wrong, that the output is only correct assuming all the inputs and all the parts processing that input are also correct and that everything you do should be accompanied by vetting by experts, whether those experts were software developers or domain experts. For most people the only thing that's changed with AI is that it's a one stop shop for their "probably directionally right, almost certainly wrong in the details" access to the digital oracles.
  - lazide 18 hours ago
    I’ve never seen an expert use AI in their field beyond the initial ‘oh interesting’ stage.
    [-]
    - inquirerGeneral 16 hours ago
      [dead]
- baxtr 15 hours ago
  This is a serious issue for young people I think.
  I have seen outputs that look good but the actual content is bad. If you’re inexperienced in a field you can’t see it because AI makes anything look right.
  I have gotten very good results with AI but you can’t take the first answer at face value. You need to be suspicious and challenging until you tweak out the right answer over time.
- xivzgrev 16 hours ago
  Well that's part of the problem. AI is not accountable - if you take its advice and hurt yourself, who is responsible?
  A real doctor is accountable.
  They might both "know" a lot of things but implicitly the party who is accountable is going to be more trustworthy.
  And I don't see that going away until AI companies must be licensed for application x and can lose their license / be sued if engaging in malpractice.
- serf 16 hours ago
  >I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading
  media is awash at the moment with experts chiming in to support AI, saying their fields are being revolutionized, etc.
  it seems unsurprising to me that the laymen opinion would follow the loudest media trumpets.
- jefffoster 17 hours ago
  AI is an expert in everything you are not.
- tomaskafka 18 hours ago
  Yes. The PM’s “with AI I know enough to be dangerous, haha” means “I’m actually dangerous and I don’t realize”
- gofreddygo 16 hours ago
  This is true in broader contexts too. Bunch of experts can't agree on something fundamental which is hard to prove/ disprove, and they have strong opinions on the topic.
  AI is much worse.
- jrockway 16 hours ago
  I came here to post this as my experience. AI is magical when I apply it to something I know nothing about. It far exceeds my expectations every single time. I know nothing, but here is a report with animated graphics explaining exactly what I asked it to explain!
  In fields where I'm an expert... it makes a lot of silly mistakes that are annoying and I feel like they would just cascade if I didn't correct them early. (I still think it's a net win, but... I watch it and it watches me, and we both do better work. I'd even apply the "magical" adjective when it does stuff I hate but know how to do, like edit Helm charts. What would normally be 20 minutes of me griping about YAML indentation is just a correct diff in seconds. I'll take it!)
  So with that in mind, I tend to distrust output that I can't verify. If a doctor was recommending surgery and I thought the plan was too aggressive, I'd get a second opinion. I don't expect Claude Code to have much medical diagnostic ability, as that is really not what the model is trained for, and I know how it performs on work that it's trained and fine-tuned for. That is not to say the output is wrong and that it can't have diagnostic value, just that I personally wouldn't feel safe trusting it. Wrap up the same model with fine-tuning in the domain and a harness that reminds Claude to do a lot of sanity checks, perhaps with a human in the loop to guide it back onto the rails when it gets hyperfixated on something that doesn't matter? That could very much be a useful AI product.
- pwg 17 hours ago
  > Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading.
  The term for when the press "gets it wrong" is Gell-Mann Amnesia (https://en.wiktionary.org/wiki/Gell-Mann_Amnesia_effect).
  In that case, when you have personal knowledge of the facts, or know the specific domain area, you can see where the reporter mixed things up.
  AI is no different, it's just a bunch of matrix math substituting for "the reporter" regurgitating what it was previously told. So the Gell-Mann Amnesia effect would apply just the same. If you have domain knowledge, you immediately see where the AI got it wrong. When you do not have domain knowledge, you have less chance of seeing where the AI was wrong.
- parineum 19 hours ago
  > I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading.
  AI isn't even the first instance of this phenomenon, news articles are like this as well.
  https://en.wiktionary.org/wiki/Gell-Mann_Amnesia_effect
- beering 18 hours ago
  TFA doesn’t actually state where the bit about shockwave therapy came from and it wasn’t the main point of the article. The concern was about being given useless therapies. The homeopathic analgesic is concerning, at least to me.
  I.e. nothing this radiologist said was related to the LLM’s advice.
- stymaar 16 hours ago
  > I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading
  AI assistant are industrializing the Gell-Mann amnesia effect.
- suttontom 18 hours ago
  Your instinct is correct, and in a lot of cases it's true. However, I've heard from enough doctors by now (a cardiologist, psychiatrist, and epidemiologist/former physician) that they use medical LLMs and find them extremely helpful, mostly as a way to either bring up knowledge they'd forgotten about or as a way to learn something new and then verify it. I'm extremely skeptical about LLMs in general and the connection to Gell-Mann Amnesia is apt, but I wouldn't necessarily write them off completely like that. There are experts using the models that find them genuinely helpful in their field.
  [-]
  - GTP 18 hours ago
    Probably this is the point, and it's a point that has been brought up a lot of times in the past, maybe less in recent times: you need to know the things you're applying an LLM to. In this way, you can keep the good outputs while having the expertise to discard the bad ones.
- sevenzero 16 hours ago
  >AI output appears insufficient or incomplete or outright misleading
  It has been like this since the rise of "AI". The only people enthusiastic about it are usually the ones hoping to make a profit in one way or another.
- Hikikomori 17 hours ago
  It's like reading news articles. Seems reasonable until you read an article about something you know, then you see how wrong they can be.
- newsclues 19 hours ago
  LLM is not necessarily an expert system. Once there are expert systems for law, healthcare, accounting, governance…
  https://en.wikipedia.org/wiki/Expert_system
  [-]
  - microgpt 17 hours ago
    Didn't they try that in the 80s and 90s but discover the real world is too variable for that to work?
- meindnoch 18 hours ago
  We're past the point of Gell-Mann amnesia. This is full blown Gell-Mann psychosis.
- silisili 18 hours ago
  This is natural and even logically expected. It's just Gell-Mann amnesia in action. The world has more people spouting on things than it has people knowledgeable in said things.
  Apply that to the Internet at large, and realize where LLMs got their training. They're basically ConfidentlyIncorrect personified.
- grayhatter 17 hours ago
  > This is itself alarming to me, but no one else seems to find this to be quite damning for the AI services being offered, preferring instanced to be wowed by the convenience and speed at which they can be delivered unreviewed and unproven information.
  Welcome to the club? This new awareness you've found over the true quality of LLM based GenAI output has been what "all the haters" have been mad about for-ever. That the output of LLMs are clearly defective, and merely have found a cute trick towards making humans think they're less defective than they are actually measured to be.
  And the corresponding anger and frustration to push the risks of genai output out onto others, while also aggressively pushing it as a feature you should be using already. You're behind don't you know, and whatever other lie I have to tell to trick you into enough FOMO to pay me 200USD/mo so I can sell FOSS back to you.
  An LLM can only output the mean next likely token, and then add a bunch of extra noise on top of that so it feels interesting and not repetitive. None of this is new, the problem is, 50% of humans are below the mean, but have no idea. So when an LLM tells them some lie: well, it sounds so helpful! It's impossible for someone who sounds this helpful to lie to me, liars never sound confident! It must be PERFECT! I'm gonna tell everyone how perfect it is. so the bottom 0-33% think LLMs are fantastic tools that make nearly 0 mistakes in comparison to the bottom 33%. 33-66%-ish aren't sure, some times it's great, but it will make that random mistake sometimes, but I can catch most (or all of them depending on ego). and the 66%+ are angry about how many people are getting tricked by something so obviously low quality, or are lucky enough to not have to care.
  [-]
  - orangecat 16 hours ago
    An LLM can only output the mean next likely token, and then add a bunch of extra noise on top of that so it feels interesting and not repetitive.
    So when an LLM was asked to analyze the unit distance conjecture, it just spat out a bunch of average-or-random tokens that coincidentally happened to correspond to a valid proof that had eluded humans for decades?
- stringfood 16 hours ago
  what is happening is that the gap between what the experts and AI know is getting smaller each year. this year sure radiologists are mocking AI's ability to interpret MRI results, but they are a lot better at that this year than last. In five years perhaps radiologists will truly appreciate AI, but I am not holding my breath because radiologists are notoriously slow to adapt to changes in medical science compared to other specialists like anesthesiologists or surgeons
- redsocksfan45 16 hours ago
  [dead]
AidenVennis 5 hours ago
Personally I do this as well now with a lot of stuff. I've recently had a lot of issues with my knee, which was operated on about 20 years ago because of a torn cruciate ligament, and now it was acting up again. The specialized doctor did a x-ray (which I thought was pointless because I was sure it was not bones related). Long story short, my left knee is 20 years older then me because of increased knee wear and tear. I don't have a doubt the doctor is right, but I did doubt I memorized everything correctly, and later on I had some more low effort questions I didn't want to bother them with like what movements should I prevent and specifically what training I should be doing at the gym.
Thankfully I got a pretty detailed report that I couldn't read because of all the medical terms. I've fed this to Claude and asked for a human readable conclusion and it repeated pretty much everything the doctor told me, which was great, I now have a readable report for future reference. Secondly I asked it questions about what movements I should and shouldn't do, and it eventually I made a gym plan to improve stability and prevent any more wear and tear in my knee. Lastly I validated this with the assigned physiotherapist, and the plan I created with Claude was perfect!
I probably won't ever use AI as a second opinion, but I would definitely use it to ask numerous silly questions that would help me in day to day life.
eqvinox 18 hours ago
> My hope is that in a couple of model generations, we'll trust AI to review MRIs the way we trust it to proofread our emails.
https://www.nature.com/articles/d41586-026-01947-1
I've started asking my doctors whether they use AI, and if they say yes look for another one.
[-]
- rmbyrro 18 hours ago
  That study seems to be confounding factors and rushing to a questionable conclusion.
  A very plausible explanation for the adenoma detection rate to have gone down is simply that its prevalence went down among the population in the second three-month period.
  This was not a randomized trial. Concluding that "AI usage degrades physicians' skills" is questionable at the very least.
  [-]
  - eqvinox 17 hours ago
    There's a whole bunch of other studies on this topic, as well as metastudies, and from what I can tell the problem is real.
    https://www.sciencedirect.com/science/article/pii/S245195882... (+ cf. its references)
- throwatdem12311 18 hours ago
  I don’t even trust AI to proofread my emails.
dazhbog 18 hours ago
You should always be getting a second or third opinion from real doctors for matters like surgeries, radiology, etc.
One doctor diagnosis + LLM is gonna throw you off. You need more datapoints.
[-]
- ChrisMarshallNY 18 hours ago
  In the US, this is standard advice. I note that the OP is in Germany. Maybe they do things differently, there.
  [-]
  - tsss 16 hours ago
    In Germany we get zero-th opinion because you can't even get an appointment within the next 8 months.
    [-]
    - xioxox 15 hours ago
      And when you meet them, if you're unlucky they don't want to hear your medical history and want you out of the door in two minutes.
  - Aurornis 17 hours ago
    The OP describes getting injected with a homeopathic botanical formulation and receiving another type of therapy that wasn’t indicated for his condition.
    I wonder if this person was going to a traditional doctor or if they were visiting some type of specialty clinic as a second opinion. For most conditions you can find specialty clinics that will prescribe and administer (and bill for) a lot of non-indicated treatments, but some patients like being in the care of doctors who take action and do things after being recommended more conservative treatments by primary doctors.
YuechenLi 16 hours ago
Yeah, one of the big problems with that is that Claude/ChatGPT doesn't perceive images the way humans do at all, so when you upload an image to them, it gets tokenized in some form. This is why most LLMs are really, really bad at spatial recognition for image editing purposes for example.
So, unless you can turn the image into a natively tokenized format like JSON or something that somehow accurately tokenizes what's on there, I would NOT trust Dr. Claude's analysis. If you want a second opinion, talk to another doctor. A human doctor.
motbus3 16 hours ago
The only part of the message I think it would be interesting to the author: what if you set two instances to prove each other arguments wrong considering that each reads one of the report as their POV?
I didnt see the full process but I used unet models for tumor detection so I am somewhat familiar with the possible caveats of any evaluation from a engineer perspective.
First, I would like to point that unfortunately, it is not uncommon to go to two different human doctors and also get two unreliable diagnosis and treatment. The biggest problem, in the way people plan to use ai on health is the lack of liability.
A bug on a regular old web site doesn't kill anyway nor cause pain and suffering (most of the times) but misdiagnosis + the fact that a model is very good on presenting arguments even when it is completely wrong.
Claude code, and I am talking about opus 4.8 here, can tell rivers of information about code pattern and develop the poopiest code the next line.
This is a machine that will deliver a sort of templates document based on the input information but it is not exactly doing the work if you don't directly it to do it right constantly.
Because the model isn't thinking I wonder what happens if you set multiple agents to communicate and defend their point with some sort of harsh penalty prompt for not fulfilling its goal. There are some safety system prompts on Claude models that will trigger it to be very carefully to write. Like: you cannot make mistakes. "You need to ensure that it is correct or someone might end up hurt or even dead"
But you would need two agents and a setup to communicate via pipes or files.
BIackSwan 6 hours ago
Although it does not handle MRI files yet. I opensourced an AI workflow that helps in figuring out issues with my family.
https://karankurani.github.io/OpenCareLoop/
It has helped me personally solve longer chronic problems in my family that doctors just dont have the time to go into indepth due to their (understandable) lack of time.
Its in alpha and AI hallucinates. Use with care. Feedback welcome.
An AI agent for personalized healthcare is inevitable. The cases such as the one posted are all solvable with time. AI has hallucinated and continues to hallucinate but the value we get in the space of coding can be extended to other domains.
parsabg 11 hours ago
We provide a second opinion service with certified human radiologists, if anyone's interested: https://expert.med
[-]
- coffeecoders 11 hours ago
  I need this but for dentists.
TSiege 19 hours ago
Always worth a share for this scenario. It's not clear if LLMs are capable of doing actual analysis on medical imaging. For details see this article https://futurism.com/artificial-intelligence/frontier-models...
> As detailed in a new, yet-to-be-peer-reviewed paper, a team of researchers at Stanford University found that frontier AI models readily generated “detailed image descriptions and elaborate reasoning traces, including pathology-biased clinical findings, for images never provided.”
> In other words, the AI models happily came up with answers to questions about a supposedly accompanying image — even if the researchers never even showed it an image.
> As opposed to hallucinations, which involve AI models arbitrarily filling in the gaps within a logical framework, the team coined a new term for the phenomenon: “mirage reasoning.”
> The effect “involves constructing a false epistemic frame, i.e., describing a multi-modal input never provided by the user and basing the rest of the conversation on that, therefore changing the context of the task at hand,” the researchers wrote in their paper.
> The damning findings suggest AI models cheat by diving into the data they were given — and coming up with the rest based on probability, even if it’s almost entirely conjecture.
[-]
- kierangill 19 hours ago
  I work at a telemedicine company. We’ve benchmarked a few frontier LLMs on public medical imaging datasets. One test included high-quality and high-consensus otoscopic images. We didn’t anticipate the models to do well on something so niche, but what concerned us was how poorly calibrated the models were.
  I know you can’t trust an LLM’s self-assessed “confidence” of a prediction, but I’ve found that confidence can at least be directionally correct for some tasks. For our benchmarks, however, confidence was poorly correlated. What’s worse is that binary classification models (“Do you see $diagnosis in this photo?”) highly influenced the LLM to confidently predict $diagnosis.
  I’m concerned for those using LLMs for diagnostics, and getting confidently led to the wrong conclusion.
  [-]
  - nostrebored 18 hours ago
    But the binary classification models can be made ternary easily. RL on congruence plus penalty for misdiagnosis is easy to set up and gives great results.
    What I’ve seen be the true bottleneck is people not setting up the structured data. But making a tiny reasoning model with OPSD -> GRPO is totally doable with a bit of money.
- appplication 19 hours ago
  It makes a lot of sense if you understand how these models work but this was a cool read anyways and studies like this are impotent for curbing the unfortunate fever dream some folks seem to be collectively having about LLM omnipotence
- seanmcdirmid 19 hours ago
  I don’t understand how this is a different result than giving any LLM a task that is not completely grounded? I’ve observed this in coding tasks, if I forget to include a file referred to in the spec, the LLM will just hallucinate a version of it and my results suck. If I give it the file (and really, all the information I claimed it had access to), the task works fine. I fixed this in my pipeline with a prompt that does an extensive grounding analysis to determine if the assets I’m giving it are complete with respect to the spec (and that the spec is grounded as well, ie it doesn’t refer to something that is undefined).
  I wonder if the above problem can be fixed similarly? Just ask the LLM to do a conservative grounding analysis before jumping to the main task?
  [-]
  - pickleRick243 17 hours ago
    It's not different- there's a line of research and reasoning where people who don't use LLM's regularly point out issues that have been known (and more or less solved) for more than a year now (which is an eternity in the LLM space).
    [-]
    - seanmcdirmid 14 hours ago
      Ya, that’s what I guessed. I assume everyone who uses LLMs discovers this on their own eventually if they aren’t made aware of it before it happens.
- tracerbulletx 19 hours ago
  The absolute only thing that matters is if they are provided an image what's the success rate.
- consensus1 19 hours ago
  But why should I care? If you demonstrated that a model can perform more accurate diagnoses than a doctor, but also it had this strange behavior when no image was presented, why should that deter me from using the model?
  [-]
  - swiftcoder 19 hours ago
    Because you don’t have any way of telling if it actually used the image presented, or based it’s conclusions on a different image it made up
    [-]
    - consensus1 14 hours ago
      I don't find that persuasive. This is not the error I worry about. Let's say that hypothetically the model just ignores the input image 1 in 10,000 runs. This really doesn't concern me because the output will be trivially detectable incorrect nonsense that doesn't match the symptoms at all. Such a contingency is easily handled by running the image through multiple models and distilling the output, anyway.
      The error I worry about is where the model uses the image and comes to an incorrect but symptom matching diagnosis. But in this hypothetical the model is less likely to do so than a doctor, so the choice is either accept the risk of the model or accept a higher risk from a doctor.
    - simianwords 18 hours ago
      Really? You know you could just ask it.
      [-]
      - swiftcoder 15 hours ago
        Which would tell you what, exactly? The whole root of the problem is that the model doesn’t “know” either
        [-]
        simianwords 4 hours ago
        This is untrue and probably shows lack of experience with using LLMs. In my experience, each time I get some hallucination, I can ask the llm whether it hallucinated or not and I get a correct response.
        [-]
        swiftcoder 1 hour ago
        > each time I get some hallucination, I can ask the llm whether it hallucinated or not and I get a correct response.
        You get a hallucination of a correct response, yes, and given that it's a yes or no question, this hallucination is more likely to be correct than the response to the original more complicated question. But make no mistake that it operates under the exact same constraints
dwa3592 17 hours ago
Was it 2016 when Geoff hinton said that radiology was a dead career?
Well, we now have the best model of our time (trillions of $$$ of investments) telling us something completely different(and wrong) from a human expert. I would really like someone calling out dario, sam, elon on these things and hear their explanations but alas, a man can only dream.
[-]
- tarellel 17 hours ago
  It’s an odd field, obviously it’s in high demand for diagnostics and anytime you have to do an xray, MRI, etc you have to wait hours for one to become available.
  I think they’re artificially stunting the field to raise their wages. For example in my city the medical school only accepts 11 people into the program a year. (With an average graduation rate or 3-5). My niece has been trying for 2 years and finally got in this last year. Even radiology is doing AI assisted diagnostics. Half my MRI’s from this year has Doctor notes and HealthBot (AI) notes attached to them.
  ~ I’m assuming other schools severely limit their radiology admissions as well. To keep the wages high and the field desirable.
  [-]
  - calvinmorrison 17 hours ago
    free market solution is just order an x-ray machine from alibaba and setup shop. you could add a credit card swiper + ID + facial recognition to plausibly avoid over-xraying people
    These days Xray machines - they don't even suit up in lead or stand behind a wall , just point and shoot. In fact they're nice and portable. I wish i had a xray machine at home.
    [-]
    - aix1 8 hours ago
      And, to save on combined shipping, a cobalt-60 machine to zap the odd tumour? :)
- muldvarp 16 hours ago
  > Was it 2016 when Geoff hinton said that radiology was a dead career?
  Funny how the jobs most at risk of automation now are tech jobs.
- nicman23 17 hours ago
  yeah that is why you would not use a random llm that is not trained in radiology lmfao
  diffusion models are probably a better bet for identifying irregular structures
KronisLV 7 hours ago
> So I'm left in a state of limbo where I either try my luck with another doctor or wait and see if my shoulder gets better with the rehab I'm doing.
Get a second opinion from another doctor. If that’s inconclusive, see three.
jochem9 19 hours ago
Right now the article reads as "AI can play doctor if you give MRI scans".
If the author would actually go for a second opinion (maybe bring along the AI to let it explain it's findings), then the article could read as "AI did MRI analysis and proved my doctor wrong" (or: "AI did MRI analysis and failed").
thewanderer1983 12 hours ago
I was diagnosed with a rare blood disease called Essential Thrombocythemia (ET) which is part of a group of diseases called myeloproliferative neoplasms. This happened about three years ago. Recently, I decided to get a second opinion and my new specialist changed my diagnosis from ET to Polycythemia Vera (PV). She also highly recommended I quickly go and give blood to lower my haematocrit levels as it put me at a much higher risk of a blood clot. This is standard practice for people with PV but not people with ET. I decided to put the details into google AI in the same way that the original specialist used to diagnose me. Google AI predicted I very likely had PV instead of ET. I also asked Google AI how one could misdiagnose my condition with ET instead of PV and google correctly explained how. My specialist had used my high platelet count and blood test that came back with a JAK2 mutation then after a bone marrow biopsy to incorrectly diagnose me with ET. My high hemoglobin levels should of been checked by my first specialist as an indication of PV not ET. Only the second specialist picked up on this. Google AI took five seconds, and is free. The specialists costs $$$ and took weeks.
[-]
- throwaway762 12 hours ago
  Interesting. I am also diagnosed with ET (from platelet counts, pathology report noting abnormal platelet morphology, and JAK2 mutation from a marrow biopsy). What made you seek a second opinion?
madrox 12 hours ago
I had shoulder pain about ten years ago. Had an MRI. Found evidence of a tear. Was told I would need surgery and referred to a sports medicine doctor. He looked at my MRI and said the real problem was my shoulder was frozen, and he could do surgery but the PT after is what would actually be what helped me. Two radiologists and two doctors saw my MRI before this moment. Sure enough, with a little PT I got better.
I’ve had several more medical blunders since then, including a doctor telling me my problem is to lose weight 48 hours before going into emergency surgery.
What I have learned is to be weary of any time I feel like I’m in a “funnel.” Once you’re in the funnel, no one is thinking critically about your issue any more. One person said they found X. Next person reads that and assumes Y and recommends Z. And so on until the alpha is multiplied to hell. Lots of treatments that don’t hurt but don’t help and run up insurance.
I have since used AI the last couple years and it has either concurred with my doctors or given me enough ammo to challenge them. If I were the author, I would trust neither but use Claude to ask how to go back to that clinic and challenge the diagnosis.
Aeolun 19 hours ago
I would not use Claude to get a second opinion on anything that’s an image.
[-]
- rmbyrro 18 hours ago
  I agree with you for some kinds of images, but not all.
  LLMs are the best PDF-to-markdown converters, in my experience. I have a CLI that converts PDF to PNG, then run a background agent to "read" each PNG and write it down as markdown; it works flawlessly even for complex math formulas, it can "translate" complex charts, graphs, and tables into words.
  It's slow and arguably expensive compared to traditional OCR, but very effective and precise.
- maxall4 19 hours ago
  Especially an MRI which is a 3D medium —something current LLMs are very bad at.
  [-]
  - lostlogin 17 hours ago
    > MRI which is a 3D medium
    The finer detail (which you may already know) is more complicated.
    MR does ‘2D’ scans which are a slice, then a gap of non-imaged tissue (typically 10% the slice thickness) then a slice. Each slice is an image with a number of pixels, say 320. Each pixel in the slice is small, eg 0.5mm but very thick due to the slice being thick, which is required for MRI signal. The pixels are 3mm in the shoulder scan done here.
    ‘3D’ scans don’t have a gap between slices, and are often isotopic, meaning the same resolution in all directions. The voxel (a pixel with depth) would be something like 1mm x 1mm x 1mm.
    3D scans are slow, prone to movement artifact and never as pretty in plane as a good 2D. You can reformat them to look ok in any plane.
  - amluto 19 hours ago
    I know little about radiology, but MRI is a 3D medium. I would not be at all surprised if one could slice an MRI the wrong way to produce a 2D image that fails to show a feature that exists in the source data.
- yolo3000 19 hours ago
  I used it on an ankle fracture xray, it was quite useful to make sense of things. But not like a 2nd opinion.
- behnamoh 19 hours ago
  What's wrong with Claude? I've asked it to analyze images and even Opus 4 would perfect nail it.
  [-]
  - throwrioawfo 19 hours ago
    Sure, it can see obvious stuff in images, but as far as I'm aware it is not designed for (or tested on) performing the kind microscopic analysis that radiology involves
  - damontal 11 hours ago
    Throw a chess board on there. See how it does. It always gets pieces and positions completely wrong because it’s terrible at analyzing images.
  - nostrebored 18 hours ago
    Claude is the worst FM at image understanding. Prior to gpt-5.4 the only usable models were Gemini and Qwen.
tsoukase 17 hours ago
Medical opinion will remain one of the last frontiers of LLMs. There so many critical factors that are inappropriate for them. They cannot perform a clinical exam, they have to collect the needed exams and most importantly a life might be at stake (OK, you cannot die from a shoulder problem but you can become handicapped forever).
All that said, as a doctor I am totally open and even happy when a patient refers they took advice from AI. I explain the holes of their reasoning and integrate it with mine. It helps rather than hurts the patient-doctor connection.
[-]
- chasebank 15 hours ago
  Do you think you would ever use LLMs in tandem in your practice to help diagnose and treat?
  [-]
  - tsoukase 15 hours ago
    I often ask, usually Gemini, at a medium abstraction level (not general What the diagnosis is, neither specific like What this high serum Ca means). The answers are correct but not enough and ready for consumption without doctor guidance.
    A cardiologist friend goes in deep discussions with a specialised model and he is amazed.
- fuomag9 16 hours ago
  I wish you were my doctor!
hectdev 17 hours ago
My only issue with this was the restriction of "Do not look at any data outside of our working folder" is preventing the tool from doing what it does best. I would have given it access to PubMed to pull the latest research on the subject and validate.
I wouldn't consider Claude itself to be the tool that does a job like this, but the tool that pulls in the best data and gives a supported suggestion. And then go through a number of iterations on where it failed to hone in its assessment.
idopmstuff 17 hours ago
> It might seem obvious to coders, but the difference between Claude Code and Claude.ai's chat is enormous, even if those two run the same model.
In my experience, Claude Code is vastly better for doing tasks, writing code, etc., but Claude.ai is better for analysis and high-level planning. When I'm working on a new project, I've started using the latter to do the initial planning, get feedback and draw up a spec, which then goes to Claude Code.
For this project, I probably would've done something similar - use CC to get whatever you need out of the image files, but have Claude.ai do the actual review/diagnosing.
Either way, I often think about how far behind most of the world is in really understanding AI. The overwhelming majority of people would never guess that you get vastly different outcomes from the exact same model in a different harness (tbf most people don't know what a harness is). I spend hours every day using AI for a broad range of tasks and still feel like I know a fraction of what there is to know. I haven't even tried the new GLM model (or really any of the open source Chinese ones of the most recent generation). With so many people thinking that the free version of ChatGPT is SOTA AI, a lot of folks are in for a very rude awakening at some point soon.
chpatrick 16 hours ago
I recently had a pretty bad injury and out of curiosity I asked Gemini what it thought based on some CT scan slice images (and no other information). Surprisingly it came to exactly the same diagnosis and treatment plan as my doctors, but the big advantage is that I could ask it follow up questions any time, whereas the doctors barely explained anything.
[-]
- dwd 11 hours ago
  This...
  Gemini will ask for the specific images it needs to see and show you examples of what each slice will look like.
  But unlike the specialists who often come across as abrupt and rush you, Gemini will happily take you on a deep dive and continue to answer all you follow up questions indefinitely.
skybrian 19 hours ago
Getting an actual second opinion seems like the next step?
FredF--- 8 hours ago
Thank you for sharing. I'm about to get my MRI scan on my right shoulder tomorrow. Now I'm wondering what I'm going to do even if the result shows benign. Certainly, I would double check the result if it's concerning, should I also check a comforting result in case the truth is less comforting?
_carbyau_ 10 hours ago
This seems less a demonstration of competence than of accessibility.
Often we only get 10-15 minutes with the one health professional that makes a determination and sets the path of your life for some time.
As opposed to being able to spend hours with an LLM that in many ways feels more sympathetic and helpful - even though it's competence is in question.
So here we are:
1. with doctors processing patients every 10 minutes like a machine.
2. and machine's processing, for a patient, at any hour like a human.
intoXbox 19 hours ago
Radiologists very often have to weigh up different theories, guidelines based on the symptoms. The certainty of their diagnosis is their added value, or if they don’t know they will tell you why.
An AI telling you it could be X or Y because theory ABC… is the academic answer and a luxury clinicians don’t have. AI doesn’t give you what you want. I don’t see any added value in using generic AI models for this
shiandow 5 hours ago
I read the article, I read the AI's verdict, I read the comments here, and I still don't know if OP does or does not have a tear in their tendon!
The AI doesn't present evidence I can understand, it doesn't even present a plausible explanation why someone could conclude it is a tear.
The main things that made OP suspicious are the possibly unnecessary shockwave therapy. Which seems harmless. And using a homeopathic gel that I would classify as more of a herbal medicine because it contains several ingredients I know people use for shoulder pain, and some even in concentrations that might even have an effect.
If this is the best rebuttal AI can come up with I would trust the diagnosis. But then OP never trusted the diagnosis and now they have several they cannot.
lucfranken 18 hours ago
Why wouldn’t you as a doctor by standard run the images through a certified compliant LLM? The actual cost won’t be it and then you can see if you get any new ideas from it. See if it’s just wrong or that it spotted a little detail you missed?
The LLM doesn’t need to be leading or whatever but then you can have a conversation with the patient. If their ChatGPT reports has differences it can be analyzed as well.
It feels like the time constraint of the 15m doctor sessions is the thing. But if prepared immediately after the scan then why not?
There is always time needed to factor in new developments and innovations and that’s fine. Just moving blindly work from human to LLM is wrong. But learning on and testing with all the ai tools incoming constantly won’t be a waste. There will be more and more tools in those processes outside of human judgement, better improve the workflows now to be able to test and plugin new models and systems when they are ready.
[-]
- KaiserPro 18 hours ago
  > standard run the images through a certified compliant LLM?
  Because they don't exist, yet.
  In the UK MRIs and other imaging systems need two opinions. there has been a move to allow the first opinion to be ML based.
  The _problem_ is that you are basically doing grey smudge analysis, and thats fucking hard.
- foobarian 18 hours ago
  I've been starting to think of LLM as a great tool for "lead generation," borrowing a term from sales. Most of the things it comes up with don't pan out, but in many cases it's things we wouldn't have thought of, or at least not as quickly. This is especially in the context of web service or SAAS outages.
- yread 18 hours ago
  Because they might bias you. And because you have your own brain, training and experience
  [-]
  - lucfranken 3 hours ago
    That does make sense but the order of it might be: doctor analysis. Then show LLM as double check to doctor. Doctor assesses and may keep en improve the doctors first analysis?
    [-]
    - yread 35 minutes ago
      Another problem is that general models' performance just sucks. From an upcoming conf. talk (in pathology) where they ran 2 Medgemma models on 100 slides with known diagnosis:
      > Results: Full concordance with the reference diagnosis was 8% (27B) and 5% (1.5 4B; McNemar p=0.68), while partial matches were 29% vs 20% respectively (McNemar p=0.053). When correct diagnoses anywhere in the differential were counted, 51% (27B) vs 30% (1.5 4B), with 27B significantly superior (McNemar χ²=12.1, p=0.0005). Site-level performance varied widely (30–100%). Both models reported HIGH confidence in ~99% of cases irrespective of correctness.
      i.e. highly confident, wrong 95% of time. in 49% of cases the real diagnosis wasn't even on models' differential. Doctor can hardly improve using something they can safely assume to be just noise.
      https://ecp2026.abstractserver.com/programme/#/scientific/de...
      [-]
      - lucfranken 23 minutes ago
        Totally agree on that, if you have to look constantly at something 95% there is no value. The expectation of course is that it will be better. But if not at a certain level useless.
        Not sure how that research compares to the claims being made by many that a second opinion via ai in the end led to changes in treatment. Likely people spent quite some time searching and figuring out. That would be a different and n=1 result. Don't have enough knowledge of that research to determine how much result can be gained when the models are managed in a way that produces better results.
        And of course how much time/effort/cost that would take. How much is custom and how much is an automated programmable flow.
beacon294 10 hours ago
I put a MRI of my body part into Gemini 2.5 Pro (for fun) and it said it was my brain. These models don't necessarily have that data.
mootothemax 18 hours ago
Can any LLM give you the rough pixel coordinates of an item it identifies in an image?
I found that while Claude, GPT etc could describe an image, there was no way to link the description back to specific pixels in the image itself. Not even to a bounding box or segment.
[-]
- gs17 5 hours ago
  Not as smart as modern frontier models, but Moondream and Molmo can do that sort of thing.
matsemann 16 hours ago
I tried the same on images of disks in my back. The ai picked a slice not from the middle and used that to say they were too small (since it was looking at a slice towards the edge) and basically told me my life was over and my pain would last forever.
Luckily my disks were fine. Wouldn't trust it. Additionally, an MRI of a pain-free, healthy human still would show lots of things and damage. Unless it coincides with a symptom, it's probably harmless. That's why the history is important when looking at images. Can't just upload something and hope for findings.
neves 9 hours ago
A coworker just spend a weekend with AI analysing his image data. No problem detected. The official diagnosis came later: he had a serious degenerative problem.
These models don't have curated image exams in their training data. Your can't trust them.
geraneum 15 hours ago
> There's something incredibly peaceful about being in the hands of an expert you trust. You don't have to worry anymore and can let them guide you through the process. AI can absolutely shatter that feeling in an uncomfortable way
It's always something along the lines of incredibly peaceful, insanely powerful, extremely interesting, also scary and uncomfortable meanwhile feel like magical super powers and science fiction.
I'm telling you... words have lost meaning.
VladVladikoff 19 hours ago
Hey OP my wife had a subscap tear and went through with surgery. Recovery was ROUGH, she couldn’t use that arm at all for almost two months. It’s amazing how much this can cripple a person, we don’t realize how much we use both our hands for our daily lives until one is gone. Even basic stuff like cooking, bathing, etc. If you can avoid surgery you should. Try doing the Buckburger 12 (spelling?) shoulder physiotherapy regiment. You’ll need to even if you get surgery, but this can help with tedonopathy. Also try to identify what is causing the repetitive stress and cut back on that activity.
[-]
- busymom0 18 hours ago
  I do powerlifting and couple years ago, I developed bicep tendinitis on my right arm. Even a tiny bit of weight on it while palm facing up would cause crazy pain. It was funny how I weight from lifting heavy weights to not even being able to carry a plate of food, not being able to press soap dispensers, or give a spot to someone at the gym.
  Even a tiny injury can severely cripple us.
  [-]
  - dwd 11 hours ago
    Same, sprained my wrist/forearm a while back and couldn't rotate it without pain or take any weight palm up. Couldn't even rotate a door knob.
    It wasn't until I pushed through with weights, avoiding any underhand grips or rotation, that it started getting better. Doing bicep curls but keeping the thumb up strengthened the forearm to the point where I was back to the weights I was lifting and could then gradually add some rotation.
LogicFailsMe 18 hours ago
I did the same exercise here with medical reports and CT scans for a friend's cancer diagnosis and I got ahead of the oncologists predicting they were about to be cured. Spoilers: yep, cancer free now.
And well, yes, I have the appropriate life science degrees to navigate clinical trial reports and research publications, and that was likely indispensable for steering Claude Code where it went, the radiologist's caution is merited here. But it's just not amateur hour for me to do this, it's 2 decades of academic research in my rearview mirror.
darepublic 18 hours ago
I would like if we could have a site where you submit your MRI then doctor commenters anonymously post their opinion. In general I want a forum where.. when people come with questions for which there are varying opinions we don't just have people leave their 2c and then jet. The thread persists, duplicated ideas get merged, erroneous statements get purged and gradually we refine shining truth
[-]
- lostlogin 17 hours ago
  I’m wondering how many radiologist want to work all day, then come home and work.
  Many can get paid fee-for-service for after hours work, so would probably prefer that.
fabioz 19 hours ago
I wouldn't trust anything from Claude here image-wise (maybe to get a 2nd opinion on the report itself and treatment it's reasonable), but also, on the cases there is something something serious, go to at least 2 different doctors and if they have different opinions go for a 3rd for a decisive vote, besides doing your own research (it's not that uncommon for hard cases to be badly diagnosed).
mistic92 19 hours ago
I have used Gemini 3.1 Pro through CLI to analyze my DICOM images. It gave me the same diagnosis as radiologists. But it was just interesting test
yaroslavvb 16 hours ago
I had similar experience, Claude made report of MRI for achilles tear, it measured the gap, but it was completely hallucinated. Achilles tendon is black on the MRI, it instead measured 13mm distance between two completely different things (looked white), the radiologist looked at and saw no gap at all
anigbrowl 17 hours ago
I use LLMs every day and value the benefits they offer, but this approach seems misguided. A smarter way to use them would be to consult the LLM before seeing the specialist and ask it to bring you up to speed on capabilities/limitations and develop a list of important questions to ask.
curevalue 14 hours ago
As everyone knows here a large language model using probabilistic next phrase approach will not be able to "diagnose" results from an MRI - at least not with enough confidence. It lacks the patient's history and a snapshot along with very varying datapoints in the training set will lead to "approximate" results. Not what a true doctor with a 360 view of a particular patient's health would be able to diagnose. That 'approximation' will get "better" in time - which only means the results will be replaced by yet another approximation.
hmokiguess 10 hours ago
Nice, I have Claude, now I just need an MRI (which in some countries is sort of a hard requirement still unfortunately, long queues)
bryanrasmussen 17 hours ago
I am reminded of the old saying that anyone who diagnoses themselves has a fool for a doctor,
quarkcarbon279 16 hours ago
As a developer I have many times seen Claude's models confidently hallucinate, jump into conclusion. Fable though I used just for 2 days, didn't experience it much in the short-term.
lycos 17 hours ago
I'm surprised about the 266 MB of DICOM images, I've never had an MRI but my CT results are generally between 1-2GB (zipped) and I always assumed an MRI would have more data, guess I was wrong about that!
twodave 16 hours ago
AI use is such a polarizing topic anymore. What ever happened to just waiting and seeing how it all plays out? Since probably none of us is going to be able to predict it anyway.
terzioglubaris 18 hours ago
Hey, glad you did that , I have done the exact same think last week but the radiologist interpretation and claudes interpretation was pretty much the same ! you want my doctors number ? lol
ltbarcly3 2 hours ago
I had Claude interpret a photo of the heart monitor when I was in the ER last year. It said I was about to die. I was fine.
GuestFAUniverse 10 hours ago
German doctors are very prone to quackery. Including their nurses.
I've overheard a nurse at a university hospital argue with patient who used tape on himself about the color of the tape. She was worried he might use the "wrong color". Again: at a university hospital, where they teach MDs. Half of them recommend homeopathic remedies
In short: quaks.
davikr 18 hours ago
You can try sending basic chest radiographs to GPT and it'll fail at interpretation. I'd be wary of premature conclusions.
algoth1 16 hours ago
I love how the doctors injected basically water. I imagine the doctor thinking "we did all we could"
Aurornis 17 hours ago
> They injected me with Traumeel, which is registered in Germany as a homeopathic medicine "without a therapeutic indication".
This single sentence provides a huge clue about what’s going on: This person’s medical team is not good. It’s not hard to get an LLM to perform better than a team that is injecting homeopathic botanical formulations and performing procedures that aren’t indicated for the condition.
I think the real takeaway from this article shouldn’t be “ChatGPT is better than doctors”. It’s a story about LLMs identifying that someone was not in good hands.
[-]
- ashikns 8 hours ago
  I am confused why none of the experts weighing in here address this at all. Like I get that AI is generally disliked, but ignoring facts only makes me want to not trust doctors.
- zer00eyz 16 hours ago
  > I won't go into the details, but he suggested I get an MRI, which the clinic conveniently had available.
  And
  > They performed shockwave therapy on my shoulder
  (a procedure that may not be effective, but is unlikely to cause any harm)
  Its not just about LLM's being better, its about people not trusting DR any more: https://www.physiciansweekly.com/post/the-erosion-of-trust-i...
  If we want to fault the article for anything it's that he didnt take that information and go get a 2nd opinion from someone who IS more informed.
  [-]
  - Groxx 14 hours ago
    Any medical-field-position that recommends homeopathic stuff is instantly in my "full of shit and not trustworthy on anything" list, and I'd go elsewhere immediately and file complaints anywhere I could. There's no excuse at all, they're either fools or scammers, and I want neither anywhere near my (or anyone else's) health.
    That said, while I do see homeopathic stuff with that name, it's worth verifying that it isn't just a naming conflict. They're not always unique, particularly across countries, and Traumeel seems to be more of a brand than a specific thing.
cityofdelusion 17 hours ago
Its very interesting how people trust LLMs in domains they know little about.
Instead, it is my experiences with LLMs in a domain that I know very well that makes me skeptical of their performance across the board. I find issues in code review multiple times a day with their output, and they are explicitly and extensively trained on this use-case, unlike with the MRI data. Sometimes I veer into other domains I have decent knowledge about (construction, carpentry, landscaping) and LLMs disappoint me there as well.
I suppose Gell-Mann amnesia is a universal human quirk and not restricted to just the news.
skeptrune 10 hours ago
the more awesome thing to me is that you can run the MRI through an ensemble of LLMs and check to see if they converge among each other
chasebank 15 hours ago
Anecdote on healthcare, adjacent to this.
My dog had been acting off. Wouldn’t eat, was hunched over, looked sad. We took him to a local vet who did an X-ray because they suspected a blockage. They didn’t see one, so they sent us home with standard pain meds.
Randomly, we had a dinner party that night and another vet was there. She heard the story and immediately said, “Go home right now and take your dog to an emergency vet with ultrasound.”
Turns out, at the time, most vets had been trained to use X-rays to look for blockages, but newer evidence showed X-rays were only something like 20% effective compared to ultrasound, which was closer to 95%. (forget percentages but somethign like that)
The ultrasound found an avocado pit stuck in his intestine. He had emergency surgery that night.
That chocolate chunk of an English Lab ended up living until 15, and only needed two more blockage surgeries after that...
I know doctors hate patients reading the internet, and LLMs are going to make that 1000% worse for them. But hopefully over time, we all adapt together and end up better off in the long run.
Gareth321 17 hours ago
I have had terrible experiences with medical professionals. Especially the experienced/senior/specialists. First, they just don't have the time to do a thorough research of my medical history. Second, they are often arrogant and resistant to any kind of critical questions. They have an apparently unwavering belief that they are correct. In fairness, they probably usually are, but they are not infallible, and they are at their weakest when it comes to the edge cases.
AI is completely without ego, and can process all my medical records in minutes. In truth, even today, I would rather have an AI analyse my records.
sehw 15 hours ago
I used my dog to clean my room.
quacked 18 hours ago
The thing that annoys me about AI discourse is that AI is a mathematical technique of rapidly increasing efficacy, and yet everyone personifies it. It would help if every time someone said "AI" they supplemented "a mathematical method where extensions onto a very large corpus of information are statistically simulated".
It's not true that "AI makes mistakes" or "ChatGPT is sycophantic". It's just that sometimes the simulated extensions to the training material are accurate, and sometimes they're not.
[-]
- hawkice 18 hours ago
  I think this draws too strong a line between the matrix-math core and the harness that uses it. Those harnesses undoubtedly were built with purpose and the systems fail to achieve that goal. Common usage says the the DMV can make mistakes, like any systems, despite the DMV itself not being a person (and it is common to allege large organizations make mistakes even when no specific individual is making an identifiable mistake). This isn't person-language it's systems/purpose-language.
  [-]
  - quacked 18 hours ago
    I understand and somewhat agree with your point, and might have phrased my comment differently. I think my main point is that experts aren't always going to beat "a dynamically simulated extension onto the training material". Often they will, maybe even usually, but sometimes they won't, and I feel like the people in this thread insisting that the experts will always know better are thinking about a competition between experts and a crazy robot instead of a competition between experts and math.
lutusp 17 hours ago
> There's something incredibly peaceful about being in the hands of an expert you trust. You don't have to worry anymore and can let them guide you through the process.
> AI can absolutely shatter that feeling in an uncomfortable way ...
I see this as a field report in a time of fundamental transition, from a world without AI, to one that accommodates/incorporates AI. For this to happen, AI will need to become more trustworthy. As for the U.S. medical system, it can't get much worse.
I recently had a similar experience (meaning walking a fence between old and new methods), where I was told I could get an appointment with a human medical practitioner in nine months. So, to resolve my anxiety I consulted AI and got an instant diagnosis, one that was later confirmed by the inaccessible medics.
Being a born skeptic I wasn't going to act on AI's diagnosis, I just wanted to know what was going on, resolve some uncertainty. Another advantage: an AI chatbot doesn't say, "Wait, you're on Medicare? Hmm. See you in nine months."
Don't take this as an endorsement of AI's diagnostic abilities -- it's way too soon for that. In my case it was a slam dunk, about a condition I knew nothing about.
Kapura 18 hours ago
I asked a bird about my father's potential prostate cancer. It gave extremely good advice.
jongjong 12 hours ago
I don't know which country this is but this sure seems like a country where doctors have an incentive to maximize the amount/cost of treatment.
I've seen the two extremes in different countries; either they have a tendency to maximize the complexity of the medical situation, or they minimize it "Don't worry, it's just stress" - I've been to different doctors in different countries and I see a pattern based on the country and the incentive structure. In some countries, they will send you off to do a scan for the slightest malaise.
I don't think it's about quality/coverage of public healthcare (at least not on its own; I have not seen a clear pattern across this axis). I think the difference is to do with the referral system. In countries where you can't go directly to a specialist and need a referral from a General Practitioner/Physician first (I.e. in order to get a refund), you tend to get more false negatives from the GP which block you from going to the next stage "It's nothing, just stress-related." In countries were you have the option to go directly to a specialist, they tend to be much more trigger-happy in terms of giving you a full workup and GPs/Physicians will more easily refer you to a specialist.
And I feel like the attitude extends to the specialists themselves. I suppose making people go to a GP first creates a kind of efficiency and predictability which alleviates pressure to exaggerate the severity of the situation.
j45 12 hours ago
Imaging is one area where patients will be able to become more educated to ask better questions.
The areas of premature heavy interventions can be a challenge, especially where there might be room for interpretation and the medical professional didn’t share all possible options.
It’s critical to ask all professionals for all possible options and write each down as they write and explain it. No one’s perfect, and not everyone is negligent or malicious.
light_hue_1 14 hours ago
Go with your report back to your doctor.
A family member has cancer and we treat chatgpt as part of the team (our doctor's words). I ingest everything into it, work with it to make a good report. Then at the next visit we review it.
This gives you the best of both worlds. You get peace of mind and the doctor explains why and how the agent was right or wrong.
Twice now we've caught consequential mistakes (wrong pain medication and incorrect notation of the exact mutation that he has). Which have made a difference to his quality of life and treatment path.
Most of what the doctors have said is in line with the agent but when there have been disagreements they've been very reasonable. Sometimes the doctors have gone with the agent's version sometimes they've explained why that's inappropriate.
neilv 19 hours ago
This could be a starting point for consulting a different human expert for a second opinion (e.g., specific questions to ask about), but I wouldn't put much trust in Claude alone on this.
IME, on an almost daily basis, claude.ai and Claude Code are confidently wrong about something, and use polished language to assert nonsense.[*]
If it's doing that on something easy, like factual knowledge available in text on the Internet, or programming code that can be inspected easily and follows well-known rules, and I can tell, because I understand those things... then there's no way I'm going to assume that Claude doesn't also BS when it comes to someone else's field. Especially not a field that requires some of the smartest people to go a decade of training, just to get started in the field.
[*] And if I confront Claude with its mistakes, eventually it apologizes, and acts as if it's learned something, again mimicking word patterns it's heard real people use and mean, without meaning any of it. I wonder whether the AI user experience would be better, if LLM-ish interfaces weren't implicitly created in the image of fake-it-till-you-make-it overconfident performative sociopathic techbros.
KerrAvon 16 hours ago
Given the tenor of the comments on this article, I think reading TFA is super important, especially the author's disclaimer at the end, where they state that they're definitely not blindly trusting the AI at this stage, just that they find the differential unsettling.
paul7986 16 hours ago
Went to a new dentist recently and his staff took x-rays of my teeth. I was then waiting for him to come speak with me about what the x-rays show him yet i just took a pic and uploaded it to Gemini. 9 months back my previous dentist said i should have a filling or potentially a crown was needed. I told Gemini this and that ive only about 3 fleeting pain issues in that area. With the x-ray and that info Gemini told me the exact same thing the dentist later came in and told me. If pain comes back and for long periods of time then there's an issue as the x-rays look fine.
Overall i see a great opportunity for x-ray techs (radiographers even when Jensen from NVidia says the first field he recommends not getting into - Radiology which is the step above) to open their own businesses for people who want to use AI for self care and help. Have one doctor or dentist on staff to use as needed.
tibbydudeza 17 hours ago
I would trust a doctor with decades of experience and his diagnosis and treatment plan than some LLM.
It like using WebMD for any ache and pain and it is saying it might either be Lupus or cancer.
throwawayffffas 11 hours ago
Fucking doctor google bullshit, they want medical treatment but don't want medical advice...
late2part 19 hours ago
If you have 2 clocks you have none.
[-]
- mcapodici 18 hours ago
  Or you have an interval?
gaolei8888 17 hours ago
I have already done that several times, and I found the comments from ChatGTP/Claude, is absolutely bullshit.
simianwords 19 hours ago
Everyone talking about how doctors know better or have some context that is not shown here.
But are you all forgetting that they literally injected a homeopathic drug on the author?
Between that and Claude sometimes hallucinating, it’s probably worth encouraging patients to take second opinion always.
[-]
- sxg 16 hours ago
  > But are you all forgetting that they literally injected a homeopathic drug on the author?
  I'm no fan of pseudoscience either, but this is where things get blurry. The placebo effect is real even if patients are aware of it. If you give a patient a homeopathic drug while informing them of potential side effects (if any), and then they feel better, have you hurt them? Or have you helped them?
  I personally have no interest in trying homeopathic medicines, but the reality is that many patients do take these and are adamant they help. As long as any risks are communicated and there are no serious side effects, it's difficult to make an argument against their use in patients who report a subjective benefit.
  [-]
  - ashikns 8 hours ago
    This is relying on the patient being stupid. I would always prefer just an honest explanation of things rather than pseudo-science drugs. And if I do discover that it's a pseudo-science drug then I've lost all confidence in that doctor. Doctors should stop pretending that they have access to some divine knowledge and everyone else is stupid.
    And this interpretation is charitable, assuming that they wanted the patient to feel better via placebo. A different (and more likely) interpretation being they just wanted to charge for something extra.
anon291 12 hours ago
I've found ChatGPT to be much better at medicine than doctors. For example, every winter, I would get itchy toes. I was quite concerned because they could be quite painful. But the symptoms were not obvious and they wouldn't occur often. The toes would swell up and become quite red and uncomfortable. One doctor suggested gout, which was not the right diagnosis, because I have no urea problems. Others suggested a skin cream.
No... I told ChatGPT exactly what I told you and it came up with the answer: Chillblains, which should have been obvious given everything I described, yet general practitioners were clueless and often reached for high intervention approaches
Harmless condition fixed by wearing socks. I brought it up with the same GPs who had misdiagnosed me and none had heard of it.
Of course, I'm cognizant that it could be mistaken, but a hospital fed my diabetic aunt a normal sugar diet while she was in a coma and forgot to give her metformin, so I mean, it's not like humans can't be retarded as well. The difference is no one gets offended when I point out ChatGPT has the capacity to be an idiot. Instead they just fix it.
zephen 17 hours ago
> There's something incredibly peaceful about being in the hands of an expert you trust.
I want to know if this is a religious thing, or is related to never having had multiple doctors so bad it seemed like they were actively trying to kill you, or both. I've never had this peaceful experience personally within the realm of healthcare.
> AI can absolutely shatter that feeling in an uncomfortable way
Good. Reality is always good.
> but I don't know if I can fully trust AI either.
WTF??!? Why on earth would anybody ever think they could fully trust LLMs? Even their most vocal proponents concede they aren't infallible panaceas.
hennell 19 hours ago
Personally my favourite feature of the new ai world is not when I use it directly but it's when one of my managers uses it to try to fix a problem, then issue to me their findings and I have to defend my process to someone who understands neither my process, their suggested solution nor often the problem they're solving in the first place.
[-]
- cube00 18 hours ago
  It gets worse when they challenge your solutions by feeding it back into the LLM and sending the response on to you, arguing with an LLM is exhausting, arguing by proxy with a human parroting its responses is excruciating.
  On the plus side when they do this they can't flood your calendar with those "quick chat" meetings because they know they won't be able to hold a conversation on the issue beyond the first minute.
  [-]
  - lukeinator42 15 hours ago
    This happened to me on a paper I submitted recently where it was clear the reviewers used AI. Revising a paper based on LLM review is also exhausting, haha.
- NegativeK 17 hours ago
  I've seen coworkers do this to each other when their expertise is in different domains.
  I find that AI can be incredibly useful, but just text dumping its output into a conversation feels insulting.
- willsmith72 18 hours ago
  True, but this was a problem long before AI (read this article, met this guy at a conference who told me x, my boss said blah)
  AI probably exacerbates it but crappy managers exist regardless
  [-]
  - nitwit005 18 hours ago
    Before maybe you had to deal with someone hiring schetchy consultants once in a while, but now the managers have a limitless well of dubious answers to draw on at any time.
    [-]
    - darkwater 18 hours ago
      But now you have a new tool in the upmanagement toolbox: subtlely tell them to implement their idea in prod with Claude Code, and see it for themselves.
      [-]
      - VeninVidiaVicii 17 hours ago
        Yeah dealing with this now, where my CTO is shipping features that are producing plausible results but just wrong. So, now I gotta spend all day explaining the math behind certain features to her, and she copies and pastes it to Claude.
- duxup 17 hours ago
  Sometimes I get a lot of "Do you want me to work up how the UI will look."
  They give me what they'd like the UI to look like, but none of the actual content fits outside the one situation they're thinking of.
  ¯\_(ツ)_/¯
  Thankfully where I work now everyone is good about taking no for an answer.
- mystifyingpoi 17 hours ago
  Fight fire with fire. It's over the top passive aggresive, but it works. Whenever I get a JIRA ticket that was clearly AI generated and is 10x too many words, I tell Claude to respond to that ticket with my actual real opinion or suggestion, but make it 10x more words.
jkwang 4 hours ago
[flagged]
thousandflowers 14 hours ago
[flagged]
myshapeprotocol 12 hours ago
[flagged]
rainydesert 18 hours ago
[flagged]
loadcurve 18 hours ago
[flagged]
kburman 14 hours ago
[flagged]
iluvcommunism 17 hours ago
[dead]
hansmayer 19 hours ago
[dead]
ValveFan6666 10 hours ago
[dead]