Erdos 281 solved with ChatGPT 5.2 Pro

(twitter.com)

308 points | by nl 20 days ago

30 comments

xeeeeeeeeeeenu 20 days ago
> no prior solutions found.
This is no longer true, a prior solution has just been found[1], so the LLM proof has been moved to the Section 2 of Terence Tao's wiki[2].
[1] - https://www.erdosproblems.com/forum/thread/281#post-3325
[2] - https://github.com/teorth/erdosproblems/wiki/AI-contribution...
[-]
- nl 20 days ago
  Interesting that in Terrance Tao's words: "though the new proof is still rather different from the literature proof)"
  And even odder that the proof was by Erdos himself and yet he listed it as an open problem!
  [-]
  - pfdietz 17 days ago
    The theorem is implied by an older result of Erdos, but is not a result of Erdos. Apparently this is because the connection is something called "Roger's Theorem" that was quite obscure.
    https://terrytao.wordpress.com/2026/01/19/rogers-theorem-on-...
    "This theorem is somewhat obscure: its only appearance in print is in pages 242-244 of this 1966 text of Halberstam and Roth, where the authors write in a footnote that the result is “unpublished; communicated to the authors by Professor Rogers”. I have only been able to find it cited in three places in the literature: in this 1996 paper of Lewis, in this 2007 paper of Filaseta, Ford, Konyagin, Pomerance, and Yu (where they credit Tenenbaum for bringing the reference to their attention), and is also briefly mentioned in this 2008 paper of Ford. As far as I can tell, the result is not available online, which could explain why it is rarely cited (and also not known to AI tools). This became relevant recently with regards to Erdös problem 281, posed by Erdös and Graham in 1980, which was solved recently by Neel Somani through an AI query by an elegant ergodic theory argument. However, shortly after this solution was located, it was discovered by KoishiChan that Rogers’ theorem reduced this problem immediately to a very old result of Davenport and Erdös from 1936. Apparently, Rogers’ theorem was so obscure that even Erdös was unaware of it when posing the problem!"
  - TZubiri 20 days ago
    Maybe it was in the training set.
    [-]
    - magneticnorth 20 days ago
      I think that was Tao's point, that the new proof was not just read out of the training set.
      [-]
      - rzmmm 20 days ago
        The model has multiple layers of mechanisms to prevent carbon copy output of the training data.
        [-]
        TZubiri 20 days ago
        forgive the skepticism, but this translates directly to "we asked the model pretty please not to do it in the system prompt"
        [-]
        ffsm8 20 days ago
        It's mind boggling if you think about the fact they're essential "just" statistical models
        It really contextualizes the old wisdom of Pythagoras that everything can be represented as numbers / math is the ultimate truth
        [-]
        glemion43 20 days ago
        They are not just statistical models
        They create concepts in latent space which is basically compression which forces this
        [-]
        jrmg 19 days ago
        You’re describing a complex statistical model.
        [-]
        glemion43 19 days ago
        Debatable I would argue. It's definitely not 'just a statistical model's and I would argue that the compression into this space fixes potential issues differently than just statistics.
        But I'm not a mathematics expert if this is the real official definition I'm fine with it. But are you though?
        [-]
        inimino 19 days ago
        I am, and yes, that's what a statistical model is.
        mmooss 19 days ago
        What is "latent space"? I'm wary of metamagical descriptions of technology that's in a hype cycle.
        [-]
        AIorNot 19 days ago
        See this video
        https://youtu.be/D8GOeCFFby4?si=AtqH6cmkOLvqKdr0
        DoctorOetker 19 days ago
        its a statistical term, a latent variable is one that is either known to exist, or believed to exist, and then estimated.
        consider estimating the position of an object from noisy readings. One presumes that position to exist in some sense, and then one can estimate it by combining multiple measurements, increasing positioning resolution.
        its any variable that is postulated or known to exist, and for which you run some fitting procedure
        glemion43 19 days ago
        I'm disappointed that you had to add the 'metamagical' to your question tbh
        It doesn't matter if ai is in a hype cycle or not it doesn't change how a technology works.
        Check out the yt videos from 1blue3brown he explains LLMs quite well. .your first step is the word embedding this vector space represents the relationship between words. Father - grandfather. The vector which makes a father a grandfather is the same vector as mother to grandmother.
        You the use these word vectors in the attention layer to create a n dimensional space aka latent space which basically reflects a 'world' the LLM walks through. This makes the 'magic' of LLMs.
        Basically a form of compression by having higher dimensions reflecting kind a meaning.
        Your brain does the same thing. It can't store pixels so when you go back to some childhood environment like your old room, you remember it in some efficient (brain efficient) way. Like the 'feeling' of it.
        That's also the reason why an LLM is not just some statistical parrot.
        [-]
        mmooss 19 days ago
        > It doesn't matter if ai is in a hype cycle or not it doesn't change how a technology works.
        It does change what people say about it. Our words are not reality itself; the map is not the territory.
        Are you saying people should take everything said about LLMs at face value?
        [-]
        glemion43 19 days ago
        Being dismissive of technical terms on hn because something seems to be a hype is really weird.
        It's the reason why I'm here because we discuss more technically about technology
        [-]
        mmooss 19 days ago
        I wasn't dismissive, just wary. As a new account, it's odd to be lecturing people on behavior. You're the one diverting the conversation.
        [-]
        glemion43 19 days ago
        I'm in hn for 10 years.
        I spend too much time here and decided to delete my account to interact less.
        It's partially working though
        GrowingSideways 20 days ago
        How so? Truth is naturally an apriori concept; you don't need a chatbot to reach this conclusion.
        mikaraento 20 days ago
        That might be somewhat ungenerous unless you have more detail to provide.
        I know that at least some LLM products explicitly check output for similarity to training data to prevent direct reproduction.
        [-]
        TZubiri 19 days ago
        So it would be able to produce the training data but with sufficient changes or added magic dust to be able to claim it as one's own.
        Legally I think it works, but evidence in a court works differently than in science. It's the same word but don't let that confuse you and don't mix them both.
        guenthert 19 days ago
        Should they though? If the answer to a question^Wprompt happens to be in the training set, wouldn't it be disingenuous to not provide that?
        [-]
        ttctciyf 19 days ago
        Maybe it's intended to avoid legal liability resulting from reproducing copyright material not licensed for training?
        [-]
        TZubiri 19 days ago
        Ding!
        It's great business to minimally modify valuable stuff and then take credit for it. As was explained to me by bar-certified counsel "if you take a recipe and add, remove or change just one thing, it's now your recipe"
        The new trend in this is asking Claude Code to create a software on some type, like a Browser or a DICOM viewer, and then publishing that it's managed to do this very expensive thing (but if you check source code, which is never published, it probably imports a lot of open source dependencies that actually do the thing)
        Now this is especially useful in business, but it seems that some people are repurposing this for proving math theorems. The Terence Tao effort which later checks for previous material is great! But the fact that the Section 2 (for such cases) is filled to the brim, and section 1 is mostly documented failed attempts (except for 1 proof, congratulations to the authors), mostly confirms my hypothesis, claiming that the model has guards that prevent it is a deus ex machina cope against the evidence.
        ComplexSystems 19 days ago
        The model doesn't know what its training data is, nor does it know what sequences of tokens appeared verbatim in there, so this kind of thing doesn't work.
        efskap 20 days ago
        Would it really be infeasible to take a sample and do a search over an indexed training set? Maybe a bloom filter can be adapted
        [-]
        hexaga 20 days ago
        It's not the searching that's infeasible. Efficient algorithms for massive scale full text search are available.
        The infeasibility is searching for the (unknown) set of translations that the LLM would put that data through. Even if you posit only basic symbolic LUT mappings in the weights (it's not), there's no good way to enumerate them anyway. The model might as well be a learned hash function that maintains semantic identity while utterly eradicating literal symbolic equivalence.
        glemion43 20 days ago
        Do you have a source for this?
        Carbon copy would mean over fitting
        [-]
        fweimer 19 days ago
        I saw weird results with Gemini 2.5 Pro when I asked it to provide concrete source code examples matching certain criteria, and to quote the source code it found verbatim. It said it in its response quoted the sources verbatim, but that wasn't true at all—they had been rewritten, still in the style of the project it was quoting from, but otherwise quite different, and without a match in the Git history.
        It looked a bit like someone at Google subscribed to a legal theory under which you can avoid copyright infringement if you take a derivative work and apply a mechanical obfuscation to it.
        [-]
        Workaccount2 19 days ago
        LLM's are not archives of information.
        People seem to have this belief, or perhaps just general intuition, that LLMs are a google search on a training set with a fancy language engine on the front end. That's not what they are. The models (almost) self avoid copyright, because they never copy anything in the first place, hence why the model is a dense web of weight connections rather than an orderly bookshelf of copied training data.
        Picture yourself contorting your hands under a spotlight to generate a shadow in the shape of a bird. The bird is not in your fingers, despite the shadow of the bird, and the shadow of your hand, looking very similar. Furthermore, your hand-shadow has no idea what a bird is.
        [-]
        fweimer 19 days ago
        For a task like this, I expect the tool to use web searches and sift through the results, similar to what a human would do. Based on progress indicators shown during the process, this is what happens. It's not an offline synthesis purely from training data, something you would get from running a model locally. (At least if we can believe the progress indicators, but who knows.)
        int_19h 19 days ago
        While true in general, they do know many things verbatim. For instance, GPT-4 can reproduce the Navy SEAL copypasta word for word with all the misspellings.
        [-]
        Workaccount2 18 days ago
        I'd imagine more than a few basement dwellers could as well.
        NewsaHackO 19 days ago
        It is the classic "He made it up"
        Der_Einzige 19 days ago
        Source is just read the definition of what "temperature" is.
        But honestly source = "a knuckle sandwich" would be appropriate here.
        [-]
        dang 19 days ago
        Threatening violence*, even in this virtual way and encased in quotation marks, is not allowed here.
        Edit: you've been breaking the site guidelines badly in other threads as well. (To pick one example of many: https://news.ycombinator.com/item?id=46601932.) We've asked you many times not to.
        I don't want to ban your account because your good contributions are good and I do believe you're well-intentioned. But really, can you please take the intended spirit of this site more to heart and fix this? Because at some point the damage caused by poisonous comments is worse.
        https://news.ycombinator.com/showhn.html
        * it would be more accurate to say "using violent language as a trope in an argument" - I don't believe in taking comments like this literally, as if they're really threatening violence. Nonetheless you can't post this way to HN.
        Den_VR 20 days ago
        Unfortunately.
        GeoAtreides 19 days ago
        does it?
        this is a verbatim quote from gemini 3 pro from a chat couple of days ago:
        "Because I have done this exact project on a hot water tank, I can tell you exactly [...]"
        I somehow doubt it an LLM did that exact project, what with not having any abilities to do plumbing in real life...
        [-]
        retsibsi 19 days ago
        Isn't that easily explicable as hallucination, rather than regurgitation?
        [-]
        ttctciyf 19 days ago
        Those are not mutually exclusive in this instance, it seems.
      - cma 19 days ago
        I don't think it is dispositive, just that it likely didn't copy the proof we know was in the training set.
        A) It is still possible a proof from someone else with a similar method was in the training set.
        B) something similar to erdos's proof was in the training set for a different problem and had a similar alternate solution to chatgpt, and was also in the training set, which would be more impressive than A)
        [-]
        CamperBob2 19 days ago
        It is still possible a proof from someone else with a similar method was in the training set.
        A proof that Terence Tao and his colleagues have never heard of? If he says the LLM solved the problem with a novel approach, different from what the existing literature describes, I'm certainly not able to argue with him.
        [-]
        mmooss 19 days ago
        > A proof that Terence Tao and his colleagues have never heard of?
        Tao et al. didn't know of the literature proof that started this subthread.
        [-]
        pvab3 19 days ago
        there is an immense amount of stuff out there on ArXiv that no one has ever looked at
        CamperBob2 19 days ago
        Right, but someone else did ("colleagues.")
        [-]
        habinero 19 days ago
        No, they searched for it. There's a lot of math literature out there, not even an expert is going to know all of it.
        [-]
        CamperBob2 19 days ago
        Point being, it's not the same proof.
        [-]
        mmooss 19 days ago
        Your point seemed to be, if Tao et al. haven't heard of it then it must not exist. The now known literature proof contradicts that claim.
        [-]
        nl 19 days ago
        There's an update from Tao after emailing Tenenbaum (the paper author) about this:
        > He speculated that "the formulation [of the problem] has been altered in some way"....
        [snip]
        > More broadly, I think what has happened is that Rogers' nice result (which, incidentally, can also be proven using the method of compressions) simply has not had the dissemination it deserves. (I for one was unaware of it until KoishiChan unearthed it.) The result appears only in the Halberstam-Roth book, without any separate published reference, and is only cited a handful of times in the literature. (Amusingly, the main purpose of Rogers' theorem in that book is to simplify the proof of another theorem of Erdos.) Filaseta, Ford, Konyagin, Pomerance, and Yu - all highly regarded experts in the field - were unaware of this result when writing their celebrated 2007 solution to #2, and only included a mention of Rogers' theorem after being alerted to it by Tenenbaum. So it is perhaps not inconceivable that even Erdos did not recall Rogers' theorem when preparing his long paper of open questions with Graham in 1980.
        (emphasis mine)
        I think the value of LLM guided literature searches is pretty clear!
        [-]
        casey2 19 days ago
        This whole thread is pretty funny. Either it can demo some pretty clever, but still limited, features resulting in math skills OR it's literally the best search engine ever invented. My guess is the former, it's pretty whatever at web search and I'd expect to see something similar to the easily retrievable, more visible proof method from Rogers' (as opposed to some alleged proof hidden in some dataset).
        [-]
        CamperBob2 18 days ago
        Either it can demo some pretty clever, but still limited, features resulting in math skills OR it's literally the best search engine ever invented.
        Both are precisely true. It is a better search engine than anything else -- which, while true, is something you won't realize unless you've used the non-free 'pro research' features from Google and/or OpenAI. And it can perform limited but increasingly-capable reasoning about what it finds before presenting the results to the user.
        Note that no online Web search or tool usage at all was involved in the recent IMO results. I think a lot of people missed that little detail.
        heliumtera 19 days ago
        Does it matter if it copied or not? How the hell would one even define if it is a copy or original at this point?
        At this point the only conclusion here is: The original proof was on the training set. The author and Terence did not care enough to find the publication by erdos himself
- davidhs 19 days ago
  It looks like these models work pretty well as natural language search engines and at connecting together dots of disparate things humans haven't done.
  [-]
  - pfdietz 19 days ago
    They're finding them very effective at literature search, and at autoformalization of human-written proofs.
    Pretty soon, this is going to mean the entire historical math literature will be formalized (or, in some cases, found to be in error). Consider the implications of that for training theorem provers.
    [-]
    - mlpoknbji 19 days ago
      I think "pretty soon" is a serious overstatement. This does not take into account the difficulty in formalizing definitions and theorem statements. This cannot be done autonomously (or, it can, but there will be serious errors) since there is no way to formalize the "text to lean" process.
      What's more, there's almost surely going to turn out to be a large amount of human generated mathematics that's "basically" correct, in the sense that there exists a formal proof that morally fits the arc of the human proof, but there's informal/vague reasoning used (e.g. diagram arguments, etc) that are hard to really formalize, but an expert can use consistently without making a mistake. This will take a long time to formalize, and I expect will require a large amount of human and AI effort.
      [-]
      - pfdietz 19 days ago
        It's all up for debate, but personally I feel you're being too pessimistic there. The advances being made are faster than I had expected. The area is one where success will build upon and accelerate success, so I expect the rate of advance to increase and continue increasing.
        This particular field seems ideal for AI, since verification enables identification of failure at all levels. If the definitions are wrong the theorems won't work and applications elsewhere won't work.
  - p-e-w 19 days ago
    Every time this topic comes up people compare the LLM to a search engine of some kind.
    But as far as we know, the proof it wrote is original. Tao himself noted that it’s very different from the other proof (which was only found now).
    That’s so far removed from a “search engine” that the term is essentially nonsense in this context.
    [-]
    - theptip 19 days ago
      Hassabis put forth a nice taxonomy of innovation: interpolation, extrapolation, and paradigm shifts.
      AI is currently great at interpolation, and in some fields (like biology) there seems to be low-hanging fruit for this kind of connect-the-dots exercise. A human would still be considered smart for connecting these dots IMO.
      AI clearly struggles with extrapolation, at least if the new datum is fully outside the training set.
      And we will have AGI (if not ASI) if/when AI systems can reliably form new paradigms. It’s a high bar.
    - davidhs 17 days ago
      Maybe if Terence Tao had memorized the entire Internet (and pretty much all media), then maybe he would find bits and pieces of the problem remind him of certain known solutions and be able to connect the dots himself.
      But, I don't know. I tend to view these (reasoning) LLMs as alien minds and my intuition of what is perhaps happening under the hood is not good.
      I just know that people have been using these LLMs as search engines (including Stephen Wolfram), browsing through what these LLMs perhaps know and have connected together.
- cubefox 20 days ago
  This illustrates how unimportant this problem is. A prior solution did exist, but apparently nobody knew because people didn't really care about it. If progress can be had by simply searching for old solutions in the literature, then that's good evidence the supposed progress is imaginary. And this is not the first time this has happened with an Erdős problem.
  A lot of pure mathematics seems to consist in solving neat logic puzzles without any intrinsic importance. Recreational puzzles for very intelligent people. Or LLMs.
  [-]
  - glemion43 20 days ago
    It shows that a 'llm' can now work on issues like this today and tomorrow it can do even more.
    Don't be so ignorant. A few years ago NO ONE could have come up with something so generic as an LLM which will help you to solve this kind of problems and also create text adventures and java code.
    [-]
    - danielbln 20 days ago
      The goal posts are strapped to skateboards these days, and the WD40 is applied to the wheels generously.
      [-]
      - sampullman 19 days ago
        Regular WD40 should not be used as bearing lubricant!
        [-]
        danielbln 19 days ago
        Exactly!
      - glemion43 19 days ago
        I don't get your pessimism...
        Nothing of it was even imaginable and yes the progress is crazy fast.
        How can you be so dismissive?
        [-]
        danielbln 19 days ago
        You misread my comment.
        [-]
        glemion43 19 days ago
        You mean like a small rocket build? Okay :)
    - BoredPositron 19 days ago
      You can just wait and verify instead of the publishing, redacting cycles of the last year. It's embarrassing.
  - jojobas 19 days ago
    It's hard to predict which maths result from 100 years ago surfaces in say quantum mechanics or cryptography.
    [-]
    - layer8 19 days ago
      The likelihood for that is vanishingly low, though, for any given math result.
  - antonvs 19 days ago
    > "intrinsic importance"
    "Intrinsic" in contexts like this is a word for people who are projecting what they consider important onto the world. You can't define it in any meaningful way that's not entirely subjective.
    [-]
    - cubefox 19 days ago
      Mathematical theorems at least have objectively lower information content, because they merely rule out the impossible, while scientific knowledge also rules out the possible but non-actual.
      [-]
      - antonvs 18 days ago
        You have it backwards. Mathematical theorems have objectively higher information content, because they rule out the impossible and model possibilities in all possible worlds that satisfy their preconditions. Scientific knowledge can never do more than inductive projections from observations in the single world we have physical access to.
        The only thing that saves science from being nothing more than “huh, will you look at that,” is when it can make use of a mathematical model to provide insight into relationships between phenomena.
  - MattGaiser 20 days ago
    There is still enormous value in cleaning up the long tail of somewhat important stuff. One of the great benefits of Claude Code to me is that smaller issues no longer rot in backlogs, but can be at least attempted immediately.
    [-]
    - cubefox 20 days ago
      The difference is that Claude Code actually solves practical problems, but pure (as opposed to applied) mathematics doesn't. Moreover, a lot of pure mathematics seems to be not just useless, but also without intrinsic epistemic value, unlike science. See https://news.ycombinator.com/item?id=46510353
      [-]
      - drob518 19 days ago
        I’m an engineer, not a mathematician, so I definitely appreciate applied math more than I do abstract math. That said, that’s my personal preference and one of the reasons that I became an engineer and not a mathematician. Working on nothing but theory would bore me to tears. But I appreciate that other people really love that and can approach pure math and see the beauty. And thank God that those people exist because they sometimes find amazing things that we engineers can use during the next turn of the technological crank. Instead of seeing pure math as useless, perhaps shift to seeing it as something wonderful for which we have not YET found a practical use.
        [-]
        Ar-Curunir 19 days ago
        Even if pure math is useless, that’s still okay. We do plenty of things that are useless. Not everything has to have a use.
        [-]
        drob518 19 days ago
        I’m not sure I agree. Pure math is not useless because a lot of math is very useful. But we don’t know ahead of time what is going to be useless vs. useful. We need to do all of it and then sort it out later.
        If we knew that it was all going to be useless, however, then it’s a hobby for someone, not something we should be paying people to do. Sure, if you enjoy doing something useless, knock yourself out… but on your own dime.
      - jstanley 20 days ago
        Applications for pure mathematics can't necessarily be known until the underlying mathematics is solved.
        Just because we can't imagine applications today doesn't mean there won't be applications in the future which depend on discoveries that are made today.
        [-]
        cubefox 19 days ago
        Well, read the linked comment. The possible future applications of useless science can't be known either. I still argue that it has intrinsic value apart from that, unlike pure mathematics.
        [-]
        Thorrez 19 days ago
        There are many cases where pure mathematics became useful later.
        https://www.reddit.com/r/math/comments/dfw3by/is_there_any_e...
        [-]
        cubefox 19 days ago
        So what? There are probably also many cases where seemingly useless science became useful later.
        [-]
        glenstein 19 days ago
        Exactly, you're almost getting it. Hence the value of "pure" research in both science and math.
        [-]
        cubefox 19 days ago
        You are not yet getting it I'm afraid. The point of the linked post was that, even assuming an equal degree of expected uselessness, scientific explanations have intrinsic epistemic value, while proving pure math theorems hasn't.
        [-]
        glenstein 19 days ago
        I think you lost track of what I was replying to. Thorrez noted that "There are many cases where pure mathematics became useful later." You replied by saying "So what? There are probably also many cases where seemingly useless science became useful later." You seemed to be treating the latter as if it negated the former which doesn't follow. The utility of pure math research isn't negated by noting there's also value in pure science research, any more than "hot dogs are tasty" is negated by replying "so what? hamburgers are also tasty". That's the point you made, and that's what I was responding to, and I'm not confused on this point despite your insistence to the contrary.
        Instead of addressing any of that you're insisting I'm misunderstanding and pointing me back to a linked comment of yours drawing a distinction between epistemic value of science research vs math research. Epistemic value counts for many things, but one thing it can't do is negate the significance of pure math turning into applied research on account of pure science doing the same.
        [-]
        cubefox 19 days ago
        "You replied by saying "So what? There are probably also many cases where seemingly useless science became useful later." You seemed to be treating the latter as if it negated the former"
        No, "so what" doesn't indicate disagreement, just that something isn't relevant.
        Anyway, assume hot dogs taste not good at all, except in rare circumstances. It would then be wrong to say "hot dogs taste good", but it would be right to say "hot dogs don't taste good". Now substitute pure math for hot dogs. Pure math can be generally useless even if it isn't always useless. Men are taller than women. That's the difference between applied and pure math. The difference between math and science is something else: Even useless science has value, while most useless math (which consists of pure math) doesn't. (I would say the axiomatization of new theories, like probability theory, can also have inherent value, independent of any uselessness, insofar as it is conceptual progress, but that's different from proving pure math conjectures.)
        [-]
        cwnyth 19 days ago
        It really speaks to the weakness of your original claim that you're applying this level of sophistry to your backpedaling.
        [-]
        cubefox 19 days ago
        There are 1135 Erdős problems. The solution to how many of them do you expect to be practically useless? 99%? More? 100%? Calling something useful merely because it might be in rare exceptions is the real sophistry.
        glenstein 16 days ago
        So when you said "so what, hamburgers (science) taste good (is useful)", you were implicitly making a point about how bad (mostly not useful) the hot dogs (math research) was? And that's the thing that supposedly wasn't being followed on the first pass?
        That brings us full circle, because you're now saying you were using one to negate the other, yet you were claiming that interpretation was a "failure to follow" what you were saying the first time around.
      - teiferer 20 days ago
        It's hard to know beforehand. Like with most foundational research.
        My favorite example is number theory. Before cyptography came along it was pure math, an esoteric branch for just number nerds. defund Turns out, super applicable later on.
      - baq 20 days ago
        You’re confusing immediately useful with eventually useful. Pure maths has found very practical applications over the millennia - unless you don’t consider it pure anymore, at which point you’re just moving goalposts.
        [-]
        cubefox 20 days ago
        No, I'm not confusing that. Read the linked comment if you're interested.
        [-]
        TheOtherHobbes 20 days ago
        You are confusing that. The biggest advancements in science are the result of the application of leading-edge pure math concepts to physical problems. Netwonian physics, relativistic physics, quantum field theory, Boolean computing, Turing notions of devices for computability, elliptic-curve cryptography, and electromagnetic theory all derived from the practical application of what was originally abstract math play.
        Among others.
        Of course you never know which math concept will turn out to be physically useful, but clearly enough do that it's worth buying conceptual lottery tickets with the rest.
        [-]
        glenstein 19 days ago
        Just to throw in another one, string theory was practically nothing but a basic research/pure research program unearthing new mathematical objects which drove physics research and vice versa. And unfortunately for the haters, string theory has borne real fruit with holography, producing tools for important predictions in plasma physics and black hole physics among other things. I feel like culture hasn't caught up to the fact that holography is now the gold rush frontier that has everyone excited that it might be our next big conceptual revolution in physics.
        cubefox 19 days ago
        There is a difference between inventing/axiomatizing new mathematical theories and proving conjectures. Take the Riemann hypothesis (the big daddy among the pure math conjectures), and assume we (or an LLM) prove it tomorrow. How high do you estimate the expected practical usefulness of that proof?
        [-]
        glenstein 19 days ago
        That's an odd choice, because prime numbers routinely show up in important applications in cryptography. To actually solve RH would likely involve developing new mathematical tools which would then be brought to bear on deployment of more sophisticated cryptography. And solving it would be valuable in its own right, a kind of mathematical equivalent to discovering a fundamental law in physics which permanently changes what is known to be true about the structure of numbers.
        Ironically this example turns out to be a great object lesson in not underestimating the utility of research based on an eyeball test. But it shouldn't even have to have any intuitively plausible payoff whatsoever in order to justify it. The whole point is that even if a given research paradigm completely failed the eyeball test, our attitude should still be that it very well could have practical utility, and there are so many historical examples to this effect (the other commenter already gave several examples, and the right thing to do would have been acknowledge them), and besides I would argue they still have the same intrinsic value that any and all knowledge has.
        [-]
        cubefox 19 days ago
        > To actually solve RH would likely involve developing new mathematical tools which would then be brought to bear on deployment of more sophisticated cryptography.
        I doubt that this is true.
        [-]
        glenstein 19 days ago
        It already has! The progress that's been made thus far, involved the development of new ways to probabilistically estimate density of primes, which in turn have already been used in cryptography for secure key based on deeper understanding of how to quickly and efficiently find large prime numbers.
      - amazingman 20 days ago
        It's unclear to me what point you are making.
- threethirtytwo 20 days ago
  This is a relief, honestly. A prior solution exists now, which means the model didn’t solve anything at all. It just regurgitated it from the internet, which we can retroactively assume contained the solution in spirit, if not in any searchable or known form. Mystery resolved.
  This aligns nicely with the rest of the canon. LLMs are just stochastic parrots. Fancy autocomplete. A glorified Google search with worse footnotes. Any time they appear to do something novel, the correct explanation is that someone, somewhere, already did it, and the model merely vibes in that general direction. The fact that no human knew about it at the time is a coincidence best ignored.
  The same logic applies to code. “Vibe coding” isn’t real programming. Real programming involves intuition, battle scars, and a sixth sense for bugs that can’t be articulated but somehow always validates whatever I already believe. When an LLM produces correct code, that’s not engineering, it’s cosplay. It didn’t understand the problem, because understanding is defined as something only humans possess, especially after the fact.
  Naturally, only senior developers truly code. Juniors shuffle syntax. Seniors channel wisdom. Architecture decisions emerge from lived experience, not from reading millions of examples and compressing patterns into a model. If an LLM produces the same decisions, it’s obviously cargo-culting seniority without having earned the right to say “this feels wrong” in a code review.
  Any success is easy to dismiss. Data leakage. Prompt hacking. Cherry-picking. Hidden humans in the loop. And if none of those apply, then it “won’t work on a real codebase,” where “real” is defined as the one place the model hasn’t touched yet. This definition will be updated as needed.
  Hallucinations still settle everything. One wrong answer means the whole system is fundamentally broken. Human mistakes, meanwhile, are just learning moments, context switches, or coffee shortages. This is not a double standard. It’s experience.
  Jobs are obviously safe too. Software engineering is mostly communication, domain expertise, and navigating ambiguity. If the model starts doing those things, that still doesn’t count, because it doesn’t sit in meetings, complain about product managers, or feel existential dread during sprint planning.
  So yes, the Erdos situation is resolved. Nothing new happened. No reasoning occurred. Progress remains hype. The trendline is imaginary. And any discomfort you feel is probably just social media, not the ground shifting under your feet.
  [-]
  - eru 20 days ago
    > This is a relief, honestly. A prior solution exists now, which means the model didn’t solve anything at all. It just regurgitated it from the internet, which we can retroactively assume contained the solution in spirit, if not in any searchable or known form. Mystery resolved.
    Vs
    > Interesting that in Terrance Tao's words: "though the new proof is still rather different from the literature proof)"
  - catoc 20 days ago
    I firmly believe @threethirtytwo’s reply was not produced by an LLM
    [-]
    - mkarliner 20 days ago
      regardless of if this text was written by an LLM or a human, it is still slop,with a human behind it just trying to wind people up . If there is a valid point to be made , it should be made, briefly.
      [-]
      - catoc 20 days ago
        If the point was triggering a reply, the length and sarcasm certainly worked.
        I agree brevity is always preferred. Making a good point while keeping it brief is much harder than rambling on.
        But length is just a measure, quality determines if I keep reading. If a comment is too long, I won’t finish reading it. If I kept reading, it wasn’t too long.
  - johnfn 20 days ago
    I suspect this is AI generated, but it’s quite high quality, and doesn’t have any of the telltale signs that most AI generated content does. How did you generate this? It’s great.
    [-]
    - AstroBen 20 days ago
      Their comments are full of "it's not x, it's y" over and over. Short pithy sentences. I'm quite confident it's AI written, maybe with a more detailed prompt than the average
      I guess this is the end of the human internet
      [-]
      - prussia 20 days ago
        To give them the benefit of the doubt, people who talk to AI too much probably start mimicking its style.
      - 4k93n2 20 days ago
        yea, i was suspicious by the second paragraph but was sure once i got to "that’s not engineering, it’s cosplay"
        [-]
        AstroBen 20 days ago
        It's also the wording. The weird phrases
        "Glorified Google search with worse footnotes" what on earth does that mean?
        AI has a distinct feel to it
        [-]
        lxgr 20 days ago
        And with enough motivated reasoning, you can find AI vibes in almost every comment you don’t agree with.
        For better or worse, I think we might have to settle on “human-written until proven otherwise”, if we don’t want to throw “assume positive intent” out the window entirely on this site.
        testdelacc1 20 days ago
        Dude is swearing up and down that they came up with the text on their own. I agree with you though, it reeks of LLMs. The only alternative explanation is that they use LLMs so much that they’ve copied the writing style.
        plaguuuuuu 20 days ago
        I've had that exact phrase pop up from an LLM when I asked it for a more negative code review
    - threethirtytwo 20 days ago
      Your intuition on AI is out of date by about 6 months. Those telltale signs no longer exist.
      It wasn't AI generated. But if it was, there is currently no way for anyone to tell the difference.
      [-]
      - catlifeonmars 20 days ago
        I’m confused by this. I still see this kind of phrasing in LLM generated content, even as recent as last week (using Gemini, if that matters). Are you saying that LLMs do not generate text like this, or that it’s now possible to get text that doesn’t contain the telltale “its not X, it’s Y”?
      - comp_throw7 20 days ago
        > But if it was there is currently no way for anyone to tell the difference.
        This is false. There are many human-legible signs, and there do exist fairly reliable AI detection services (like Pangram).
        [-]
        int_19h 18 days ago
        There are no reliable AI detection services. At best they can reliably detect output from popular chatbots running with their default prompts. Beyond that reliability deteriorates rapidly so they either err on the side of many false positives, or on the side of many false negatives.
        There's already been several scandals where students were accused of AI use on the basis of these services and successfully fought back.
        threethirtytwo 20 days ago
        I've tested some of those services and they weren't very reliable.
        CamperBob2 19 days ago
        If such a thing did exist, it would exist only until people started training models to hide from it.
        Negative feedback is the original "all you need."
      - velox_neb 20 days ago
        > It wasn't AI generated.
        You're lying: https://www.pangram.com/history/94678f26-4898-496f-9559-8c4c...
        Not that I needed pangram to tell me that, it's obvious slop.
        [-]
        threethirtytwo 20 days ago
        I wouldn't know how to prove to you otherwise other then to tell you that I have seen these tools show incorrect results for both AI generated text and human written text.
        lxgr 20 days ago
        Good thing you had a stochastic model backing up (with “low confidence”, no less) your vague intuition of a comment you didn’t like being AI-written.
        XenophileJKO 20 days ago
        I must be a bot because I love existential dread, that's a great phrase. I feel like they trigger a lot on literate prose.
        [-]
        lxgr 20 days ago
        Sad times when the only remaining way to convince LLM luddites of somebody’s humanity is bad writing.
    - georgeven 18 days ago
      [dead]
    - CamperBob2 20 days ago
      (edit: removed duplicate comment from above, not sure how that happened)
      [-]
      - undeveloper 20 days ago
        the poster is in fact being very sarcastic. arguing in favor of emergent reasoning does in fact make sense
      - threethirtytwo 20 days ago
        It's a formal sarcasm piece.
    - CamperBob2 20 days ago
      It's bizarre. The same account was previously arguing in favor of emergent reasoning abilities in another thread ( https://news.ycombinator.com/item?id=46453084 ) -- I voted it up, in fact! Turing test failed, I guess.
      (edit: fixed link)
      [-]
      - threethirtytwo 20 days ago
        I thought the mockery and sarcasm in my piece was rather obvious.
        [-]
        CamperBob2 20 days ago
        Poe's Law is the real Bitter Lesson.
      - habinero 20 days ago
        We need a name for the much more trivial version of the Turing test that replaces "human" with "weird dude with rambling ideas he clearly thinks are very deep"
        I'm pretty sure it's like "can it run DOOM" and someone could make an LLM that passes this that runs on an pregnancy test
  - magnio 20 days ago
    Pity that HN's ability to detect sarcasm is as robust as that of a sentiment analysis model using keyword-matching.
    [-]
    - furyofantares 20 days ago
      The problem is more that it's an LLM-generated comment that's about 20x as long as it needed to be to get the point across.
      [-]
      - cubefox 20 days ago
        It's obviously not LLM-generated.
        [-]
        kleene_op 20 days ago
        Phew. This is a relief, honestly!
      - threethirtytwo 20 days ago
        It's not.
        Evidence shows otherwise: Despite the "20x" length, many people actually missed the point.
        [-]
        eru 20 days ago
        Despite or because?
        furyofantares 19 days ago
        Oh yeah, there is also a problem with people not noticing they're reading LLM output, AND with people missing sarcasm on here. Actually, I'm OK with people missing sarcasm on here - I have plenty of places to go for sarcasm and wit and it's actually kind of nice to have a place where most posts are sincere, even if that sets people up to miss it when posts are sarcastic.
        Which is also what makes it problematic that you're lying about your LLM use. I would honestly love to know your prompt and how you iterated on the post, how much you put into it and how much you edited or iterated. Although pretending there was no LLM involved at all is rather disappointing.
        Unfortunately I think you might feel backed into a corner now that you've insisted otherwise but it's a genuinely interesting thing here that I wish you'd elaborate on.
        _diyar 20 days ago
        I definitely missed the point because of the length, and only realized after I read replies to your comment.
        [-]
        threethirtytwo 20 days ago
        Next time I'll write something shorter, or if you don't believe I wrote it... then I'll tell the AI to write something shorter.
        quinnjh 20 days ago
        Its not just verbose—it's almost a novel. Parent either cooked and capped, or has managed to perfectly emulate the patterns this parrot is stochastically known best for. I liked the pro human vibe if anything.
    - catlifeonmars 20 days ago
      That’s just the internet. Detecting sarcasm requires a lot of context external to the content of any text. In person some of that is mitigated by intonation, facial expressions, etc. Typically it also requires that the the reader is a native speaker of the language or at least extremely proficient.
    - dang 19 days ago
      I'm more worried that the best LLMs aren't yet good enough to classify satire reliably.
  - nurettin 20 days ago
    Why not plan for a future where a lot of non-trivial tasks are automated instead of living on the edge with all this anxiety?
    [-]
    - threethirtytwo 20 days ago
      [flagged]
      [-]
      - undeveloper 20 days ago
        come out of the irony layer for a second -- what do you believe about LLMs?
      - jorvi 20 days ago
        I mean.. LLMs have hit a pretty hard wall a while ago, with the only solution being throwing monstrous compute at eking out the remaining few percent improvement (real world, not benchmarks). That's not to mention hallucinations / false paths being a foundational problem.
        LLMs will continue to get slightly better in the next few years, but mainly a lot more efficient. Which will also mean better and better local models. And grounding might get better, but that just means less wrong answers, not better right answers.
        So no need for doomerism. The people saying LLMs are a few years away from eating the world are either in on the con or unaware.
      - 7777332215 20 days ago
        If all of it is going away and you should deny reality, what does everything else you wrote even mean?
      - habinero 20 days ago
        Yes, it is simply impossible that anyone could look at things and do your own evaluations and come to a different, much more skeptical conclusion.
        The only possible explanation is people say things they don't believe out of FUD. Literally the only one.
  - rixed 20 days ago
    Are you expecting people who can't detect self-dellusions to be able to detect sarcasm, or are you just being cruel?
doctoboggan 20 days ago
Can anyone give a little more color on the nature of Erdos problems? Are these problems that many mathematicians have spend years tackling with no result? Or do some of the problems evade scrutiny and go un-attempted for most of the time?
EDIT: After reading a link someone else posted to Terrance Tao's wiki page, he has a paragraph that somewhat answers this question:
> Erdős problems vary widely in difficulty (by several orders of magnitude), with a core of very interesting, but extremely difficult problems at one end of the spectrum, and a "long tail" of under-explored problems at the other, many of which are "low hanging fruit" that are very suitable for being attacked by current AI tools. Unfortunately, it is hard to tell in advance which category a given problem falls into, short of an expert literature review. (However, if an Erdős problem is only stated once in the literature, and there is scant record of any followup work on the problem, this suggests that the problem may be of the second category.)
from here: https://github.com/teorth/erdosproblems/wiki/AI-contribution...
[-]
- QuesnayJr 20 days ago
  Erdos was an incredibly prolific mathematician, and one of his quirks is that he liked to collect open problems and state new open problems as a challenge to the field. Many of the problems he attached bounties to, from $5 to $10,000.
  The problems are a pretty good metric for AI, because the easiest ones at least meet the bar of "a top mathematician didn't know how to solve this off the top of his head" and the hardest ones are major open problems. As AI progresses, we will see it slowly climb the difficulty ladder.
- heliumtera 19 days ago
  Don't feel bad for being out of the loop. The author and Tao did not care enough about erdos problem to realize the proof was published by erdos himself. So you never cared enough and neither did they. But they care about about screaming LLMs breakthrough on fediverse and twitter.
  [-]
  - wasabi991011 19 days ago
    > Did not care enough about erdos...
    This is bad faith. Erdos was an incredibly prolific mathematician, it is unreasonable to expect anyone to have memorized his entire output. Yet, Tao knows enough about Erdos to know which mathematical techniques he regularly used in his proofs.
    From the forum thread about Erdos problem 281:
    > I think neither the Birkhoff ergodic theorem nor the Hardy-Littlewood maximal inequality, some version of either was the key ingredient to unlock the problem, were in the regular toolkit of Erdos and Graham (I'm sure they were aware of these tools, but would not instinctively reach for them for this sort of problem). On the other hand, the aggregate machinery of covering congruences looks relevant (even though ultimately it turns out not to be), and was very much in the toolbox of these mathematicians, so they could have been misled into thinking this problem was more difficult than it actually was due to a mismatch of tools.
    > I would assess this problem as safely within reach of a competent combinatorial ergodic theorist, though with some thought required to figure out exactly how to transfer the problem to an ergodic theory setting. But it seems the people who looked at this problem were primarily expert in probabilistic combinatorics and covering congruences, which turn out to not quite be the right qualifications to attack this problem.
    [-]
    - heliumtera 19 days ago
      Isn't it bad faith to say no priors solutions was found when a solution published by erdos was ultimately found by the community in 10 minutes?
      [-]
      - wasabi991011 19 days ago
        Maybe, that's a decent point. I didn't realize it was that quick, I would have appreciated you mentioning that in your previous comment.
        It does beg the question, if it was so easy to find the prior solution, why has no one posted it already on the erdos problems website?
        [-]
        heliumtera 19 days ago
        That sounds like a great question. Why did no one bother to mention the problem was already proved and published by the author that proposed the statement 90 years ago?
        Somehow an llm generated proof that consist of gigabytes upon gigabytes of unreadable mess is groundbreaking and pushes mathematics forward, a proof proposed by Erdos himself in 5 pages gets buried and lost to time.
        Maybe one particular optics fuels the narrative that formal verified compute is the new moat and llms are amazing at that?
        [-]
        DroneBetter 19 days ago
        the proofs written by ChatGPT are necessarily reasoned about in plain language, and are a human-comprehensible length (that is what Tao did, since it hasn't been formalised in a proof-checking language); today, the many-gigabytes (or -terabytes) proofs (à la 4-colour theorem) are generally problems solved via SAT solvers that are required to prove nonexistence of smaller solutions by exhaustion.
        and there is an ongoing literature review (which has been lucrative to both erdosproblems and the OEIS), and this one was relabelled upon the discovery of an earlier resolution
  - nddkdkfk 19 days ago
    This Tao dude, does he get invited to a lot of AI conferences (accommodation included)?
    [-]
    - wasabi991011 19 days ago
      He's the most prolific and famous modern mathematician. I'm pretty sure that even if he'd never touched AI, he would be invited to more conferences than he could ever attend.
      [-]
      - nddkdkfk 19 days ago
        [flagged]
        [-]
        wasabi991011 19 days ago
        Please follow hackernews guidelines for comments: https://news.ycombinator.com/newsguidelines.html
    - _fizz_buzz_ 19 days ago
      I know someone who organized a conference where he spoke (this was before the AI boom, probably around 2018 or so) and he got very good accommodations and also a very generous speaking fee.
pessimist 20 days ago
From Terry Tao's comments in the thread:
"Very nice! ... actually the thing that impresses me more than the proof method is the avoidance of errors, such as making mistakes with interchanges of limits or quantifiers (which is the main pitfall to avoid here). Previous generations of LLMs would almost certainly have fumbled these delicate issues.
...
I am going ahead and placing this result on the wiki as a Section 1 result (perhaps the most unambiguous instance of such, to date)"
The pace of change in math is going to be something to watch closely. Many minor theorems will fall. Next major milestone: Can LLMs generate useful abstractions?
[-]
- radioactivist 20 days ago
  Seems like the someone dug something up from the literature on this problem (see top comment on the erdosproblems.com thread)
  "On following the references, it seems that the result in fact follows (after applying Rogers' theorem) from a 1936 paper of Davenport and Erdos (!), which proves the second result you mention. ... In the meantime, I am moving this problem to Section 2 on the wiki (though the new proof is still rather different from the literature proof)."
dust42 20 days ago
Personally, I'd prefer if the AI models would start with a proof of their own statements. Time and again, SOTA frontier models told me: "Now you have 100% correct code ready for production in enterprise quality." Then I run it and it crashes. Or maybe the AI is just being tongue-in-cheek?
Point in case: I just wanted to give z.ai a try and buy some credits. I used Firefox with uBlock and the payment didn't go through. I tried again with Chrome and no adblock, but now there is an error: "Payment Failed: p.confirmCardPayment is not a function." The irony is, that this is certainly vibe-coded with z.ai which tries to sell me how good they are but then not being able to conclude the sale.
And we will get lots more of this in the future. LLMs are a fantastic new technology, but even more fantastically over-hyped.
[-]
- becquerel 20 days ago
  You get AIs to prove their code is correct in precisely the same ways you get humans to prove their code is correct. You make them demonstrate it through tests or evidence (screenshots, logs of successful runs).
  [-]
  - judahmeek 19 days ago
    Yes! Also, make sure to check those results yourself, dear reader, rather than ask the agent to summarize the results for you! ^^;
- killerstorm 19 days ago
  We should differentiate AI models from AI apps.
  Models just generate text. Apps are supposed to make that text useful.
  An app can run various kinds of verification. But would you pay an extra for that?
  Nobody can make a text generator to output text which is 100% correct. That's just not a thing people can do now.
carbocation 20 days ago
The erdosproblems thread itself contains comments from Terence Tao: https://www.erdosproblems.com/forum/thread/281
redbluered 20 days ago
Has anyone verified this?
I've "solved" many math problems with LLMs, with LLMs giving full confidence in subtly or significantly incorrect solutions.
I'm very curious here. The Open AI memory orders and claims about capacity limits restricting access to better models are interesting too.
[-]
- bpodgursky 20 days ago
  Terence Tao gave it the thumbs up. I don't think you're going to do better than that.
  [-]
  - bparsons 20 days ago
    It's already been walked back.
    [-]
    - energy123 20 days ago
      Not in the sense of being a "subtly or significantly incorrect solution".
sequin 20 days ago
FWIW, I just gave Deepseek the same prompt and it solved it too (much faster than the 41m of ChatGPT). I then gave both proofs to Opus and it confirmed their equivalence.
The answer is yes. Assume, for the sake of contradiction, that there exists an $\epsilon > 0$ such that for every $k$, there exists a choice of congruence classes $a_1^{(k)}, \dots, a_k^{(k)}$ for which the set of integers not covered by the first $k$ congruences has density at least $\epsilon$.
For each $k$, let $F_k$ be the set of all infinite sequences of residues $(a_i)_{i=1}^\infty$ such that the uncovered set from the first $k$ congruences has density at least $\epsilon$. Each $F_k$ is nonempty (by assumption) and closed in the product topology (since it depends only on the first $k$ coordinates). Moreover, $F_{k+1} \subseteq F_k$ because adding a congruence can only reduce the uncovered set. By the compactness of the product of finite sets, $\bigcap_{k \ge 1} F_k$ is nonempty.
Choose an infinite sequence $(a_i) \in \bigcap_{k \ge 1} F_k$. For this sequence, let $U_k$ be the set of integers not covered by the first $k$ congruences, and let $d_k$ be the density of $U_k$. Then $d_k \ge \epsilon$ for all $k$. Since $U_{k+1} \subseteq U_k$, the sets $U_k$ are decreasing and periodic, and their intersection $U = \bigcap_{k \ge 1} U_k$ has density $d = \lim_{k \to \infty} d_k \ge \epsilon$. However, by hypothesis, for any choice of residues, the uncovered set has density $0$, a contradiction.
Therefore, for every $\epsilon > 0$, there exists a $k$ such that for every choice of congruence classes $a_i$, the density of integers not covered by the first $k$ congruences is less than $\epsilon$.
\boxed{\text{Yes}}
[-]
- CGamesPlay 20 days ago
  > I then gave both proofs to Opus and it confirmed their equivalence.
  You could have just rubber-stamped it yourself, for all the mathematical rigor it holds. The devil is in the details, and the smallest problem unravels the whole proof.
  [-]
  - yosefk 20 days ago
    How dare you question the rigor of the venerable LLM peer review process! These are some of the most esteemed LLMs we are talking about here.
    [-]
    - falcor84 19 days ago
      It's about formalization in Lean, not peer review
- Davidzheng 20 days ago
  "Since $U_{k+1} \subseteq U_k$, the sets $U_k$ are decreasing and periodic, and their intersection $U = \bigcap_{k \ge 1} U_k$ has density $d = \lim_{k \to \infty} d_k \ge \epsilon$."
  Is this enough? Let $U_k$ be the set of integers such that their remainder mod 6^n is greater or equal to 2^n for all 1<n<k. Density of each $U_k$ is more than 1/2 I think but not the intersection (empty) right?
  [-]
  - Paracompact 20 days ago
    Indeed. Your sets are decreasing periodic of density always greater than the product from k=1 to infinity of (1-(1/3)^k), which is about 0.56, yet their intersection is null.
    This would all be a fairly trivial exercise in diagonalization if such a lemma as implied by Deepseek existed.
    (Edit: The bounding I suggested may not be precise at each level, but it is asymptotically the limit of the sequence of densities, so up to some epsilon it demonstrates the desired counterexample.)
- Klover 20 days ago
  Here's kimi-k2-thinking with the reasoning block included: https://www.kimi.com/share/19bcfe2e-d9a2-81fe-8000-00002163c...
- nsoonhui 20 days ago
  I am not familiar with the field, but any chance that the deepseek is just memorizing the existing solution? Or different.
  https://news.ycombinator.com/item?id=46664976
  [-]
  - utopiah 20 days ago
    Sure but if so wouldn't ChatGPT 5.2 Pro also "just memorizing the existing solution?"?
    [-]
    - nsoonhui 20 days ago
      No it's not, you can refer to my link and subsequent discussion.
      [-]
      - utopiah 20 days ago
        I don't see what's related there but anyway unless you have access to information from within OpenAI I don't see how you can claim what was or wasn't in the training data of ChatGPT 5.2 Pro.
        On the contrary for DeepSeek you could but not for a non open model.
        [-]
        nsoonhui 20 days ago
        I am basing on Terrence Tao comment here: https://news.ycombinator.com/item?id=46665168
        It says that the OpenAI proof is a different one from the published one in the literature.
        Whereas whether the Deepseek proof is the same as the published one, I dont know enough of the math to judge.
        That was what I meant.
- logicchains 20 days ago
  Opus isn't a good choice for anything math-related; it's worse at math than the latest ChatGPT and Gemini Pro.
- amluto 20 days ago
  I find it interesting that, as someone utterly unfamiliar with ergodic theory, Dini’s theorem, etc, I find Deepseek’s proof somewhat comprehensible, whereas I do not find GPT-5.2’s proof comprehensible at all. I suspect that I’d need to delve into the terminology in the GPT proof if I tried to verify Deepseek’s, so maybe GPT’s is being more straightforward about the underlying theory it relies on?
Eufrat 20 days ago
There was a post about Erdős 728 being solved with Harmonic’s Aristotle a little over a week ago [1] and that seemed like a good example of using state-of-the-art AI tech to help increase velocity in this space.
I’m not sure what this proves. I dumped a question into ChatGPT 5.2 and it produced a correct response after almost an hour [2]?
Okay? Is it repeatable? Why did it come up with this solution? How did it come up with the connections in its reasoning? I get that it looks correct and Tao’s approval definitely lends credibility that it is a valid solution, but what exactly is it that we’ve established here? That the corpus that ChatGPT 5.2 was trained on is better tuned for pure math?
I’m just confused what one is supposed to take away from this.
[1] https://news.ycombinator.com/item?id=46560445
[2] https://chatgpt.com/share/696ac45b-70d8-8003-9ca4-320151e081...
[-]
- Coeur 19 days ago
  Also #124 was proved using AI 49 days ago: https://news.ycombinator.com/item?id=46094037
- vessenes 19 days ago
  Thanks for the curious question. This is one in a sequence of efforts to use LLMs to generate candidate proofs to open mathematical questions, which then are generally formalized into Lean, a formal proof system for pure mathematics.
  Erdos was prolific and many of his open problems are numbered and have space to discuss them online, so it’s become fairly common to run through them with frontier models and see if a good proof can be come up with; there have been some notable successes here this year.
  Tao seems to engage in sort of a two step approach with these proofs - first, are they correct? Lean formalization makes that unambiguous, but not all proofs are easily formulated into Lean, so he also just, you know, checks them. Second, literature search inside LLMs and out for prior results — this is to check where frontier models are at in the ‘novel proofs or just regurgitated proofs’ space.
  To my knowledge, we’re currently at the point where we are seeing some novel proofs offered, but I don’t think we’ve seen any that have absolutely no priors in literature.
  As you might guess this is itself sort of a Rorschach test for what AI could and will be.
  In this case, it looked at first like this was a totally novel solution to something that hadn’t been solved before. On deeper search, Tao noted it’s almost trivial to prove with stuff Erdos knew, and also had been proved independently; this proof doesn’t use the prior proof mechanism though.
energy123 20 days ago
A surprising % of these LLM proofs are coming from amateurs.
One wonders if some professional mathematicians are instead choosing to publish LLM proofs without attribution for career purposes.
[-]
- kristopolous 20 days ago
  It's probably from the perennial observation
  "This LLM is kinda dumb in the thing I'm an expert in"
  [-]
  - fatherwavelet 20 days ago
    This is just not true at this point but believe whatever you want to believe.
    [-]
    - fatata123 20 days ago
      [dead]
  - Workaccount2 19 days ago
    Perennial doesn't make sense in the context of something that has been around for a few months. Observations from the spring 2025 crop of LLMs are already irrelevant.
  - vessenes 19 days ago
    … “but I guess it was able to formalize it in Lean, so…”
- Workaccount2 19 days ago
  >One wonders if some professional mathematicians are instead choosing to publish LLM proofs without attribution for career purposes.
  This will just become the norm as these models improve, if it isn't largely already the case.
  It's like sports where everyone is trying to use steroids, because the only way to keep up is to use steroids. Except there aren't any AI-detectors and it's not breaking any rules (except perhaps some kind of self moral code) to use AI.
- mlpoknbji 19 days ago
  I think a more realistic answer is that professional mathematicians have tried to get LLMs to solve their problems and the LLMs have not been able to make any progress.
  [-]
  - Davidzheng 19 days ago
    I think it's a bit early to tell whether GPT 5.2 has helped research mathematicians substantially given its recency. The models move so fast that even if all previous models were completely useless I wouldn't be sure this one would be. Let's wait a year and see? (it takes time to write papers)
    [-]
    - mlpoknbji 19 days ago
      It's helped, but it's not correct that mathematicians are scoring major results by just feeding their problems to gpt 5.2 pro, so the OP claim that mathematicians are just playing off AI output as their own is silly. Here, im talking about serious mathematical work, not people posting (unattributed AI slop to the arXiv).
      I assume OP was mostly joking, but we need to take care about letting AI companies hype up their impressive progress at the expense of mathematics. This needs to be discussed responsibly.
- Davidzheng 20 days ago
  I'm actually not sure what the right attribution method would be. I'd lean towards single line on acknowledgements? Because you can use it for example @ every lemma during brainstorming but it's unclear the right convention is to thank it at every lemma...
  Anecdotally, I, as a math postdoc, think that GPT 5.2 is much stronger qualitatively than anything else I've used. Its rate of hallucinations is low enough that I don't feel like the default assumption of any solution is that it is trying to hide a mistake somewhere. Compared with Gemini 3 whose failure mode when it can't solve something is always to pretend it has a solution by "lying"/ omitting steps/making up theorems etc... GPT 5.2 usually fails gracefully and when it makes a mistake it more often than not can admit it when pointed out.
ashleyn 20 days ago
I guess the first question I have is if these problems solved by LLMs are just low-hanging fruit that human researchers either didn't get around to or show much interest in - or if there's some actual beef here to the idea that LLMs can independently conduct original research and solve hard problems.
[-]
- utopiah 20 days ago
  That's the first warning from the wiki : <<Erdős problems vary widely in difficulty (by several orders of magnitude), with a core of very interesting, but extremely difficult problems at one end of the spectrum, and a "long tail" of under-explored problems at the other, many of which are "low hanging fruit" that are very suitable for being attacked by current AI tools.>> https://github.com/teorth/erdosproblems/wiki/AI-contribution...
- dyauspitr 20 days ago
  There is still value on letting these LLMs loose on the periphery and knocking out all the low hanging fruit humanity hasn’t had the time to get around to. Also, I don’t know this, but if it is a problem on Erdos I presume people have tried to solve it atleast a little bit before it makes it to the list.
  [-]
  - utopiah 20 days ago
    Is there though? If they are "solved" (as in the tickbox mark them as such, through a validation process, e.g. another model confirming, formal proof passing, etc) but there is no human actually learning from them, what's the benefit? Completing a list?
    I believe the ones that are NOT studied are precisely because they are seen as uninteresting. Even if they were to be solved in an interesting way, if nobody sees the proof because they are just too many and they are again not considered valuable then I don't see what is gained.
    [-]
    - vessenes 19 days ago
      Some problems are ‘uninteresting’ in that they show results that aren’t immediately seen as useful. However, solutions may end up having ‘interesting’ connections or ideas or mathematical tools that are used elsewhere.
      More broadly, I think there’s a perspective that literally just building out thousands more true statements in Lean is going to keep cementing math’s broadening knowledge framework. This is not building a giant castle a-la Wiles, it’s laying bricks in the outhouse, but someday those bricks might be useful.
    - ogogmad 19 days ago
      You don't see value in having a cheap way to detect when a problem is easy or hard? That would seem unimaginative.
a_tartaruga 20 days ago
Out of curiosity why has the LLM math solving community been focused on the Erdos problems over other open problems? Are they of a certain nature where we would expect LLMs to be especially good at solving them?
[-]
- krackers 20 days ago
  I guess they are at a difficulty where it's not too hard (unlike millennium prize problems), is fairly tightly scoped (unlike open ended research), and has some gravitas (so it's not some obscure theorem that's only unproven because of it's lack of noteworthiness).
  [-]
  - Davidzheng 20 days ago
    I actually don't think the reason is that they are easier than other open math problems. I think it's more that they are "elementary" in the sense that the problems usually don't require a huge amount of domain knowledge to state.
    [-]
    - xigoi 20 days ago
      The Collatz conjecture can be stated using basic arithmetic, yet LLMs have not been able to solve it.
      [-]
      - Davidzheng 20 days ago
        I agree it's easier than Collatz. I just mean I am not sure it's much easier than many currently open questions which are less famous but need more machinery.
      - _fizz_buzz_ 19 days ago
        That is also one of the hardest problems.
- becquerel 20 days ago
  People like checking items off of lists.
wewxjfq 20 days ago
The LLMs that take 10 attempts to un-zero-width a <div>, telling me that every single change totally fixed the problem, are cracking the hardest math problems again.
[-]
- int_19h 18 days ago
  Math makes sense, CSS doesn't.
niemandhier 20 days ago
Is there explainability research for this type of model application? E.g. a sparse auto encoder or something similar but more modern.
I would love to know which concepts are active in the deeper layers of the model while generating the solution.
Is there a concept of “epsilon” or “delta”?
What are their projections on each other?
renewiltord 20 days ago
It’s funny. in some kind of twisted variant of Cunningham’s Law we have:
> the best way to find a previous proof of a seemingly open problem on the internet is not to ask for it; it's to post a new proof
zkmon 19 days ago
I wonder if they tried Gemini. I think Gemini could have done better, as seen from my experiences with GPT and Gemini models on some simple geometry problems.
charmpic 20 days ago
I'm looking forward to chatgpt 5.3pro. I also use chatgpt 5.2pro for various program consultations. It's been very helpful.
[-]
- vercaemert 20 days ago
  I was hoping there'd be more discussion about the model itself. I find the last couple of generations of Pro models fascinating.
  Personally, I've been applying them to hard OCR problems. Many varied languages concurrently, wildly varying page structure, and poor scan quality; my dataset has all of these things. The models take 30 minutes a page, but the accuracy is basically 100% (it'll still striggle with perfectly-placed bits of mold). The next best model (Google's flagship) rests closer to 80%.
  I'll be VERY intrigued to see what the next 2, 5, 10 years does to the price of this level of model.
- energy123 19 days ago
  We're eventually going to get it at cerebras inference latency. It's going to be wild.
heliumtera 19 days ago
>no prior solutions found.
They never brothered to check erdos solution already published 90 years ago. I am still confused about why erdos, who proposed the problem and the solution would consider this an unsolved problems, but this group of researchers would claim "ohh my god look at this breakthrough"
IAmGraydon 20 days ago
This is showing as unresolved here, so I'm assuming something was retracted.
https://mehmetmars7.github.io/Erdosproblems-llm-hunter/probl...
[-]
- nl 20 days ago
  I think that just hasn't been updated.
mikert89 20 days ago
I have 15 years of software engineering experience across some top companies. I truly believe that ai will far surpass human beings at coding, and more broadly logic work. We are very close
[-]
- anonzzzies 20 days ago
  HN will be the last place to admit it; people here seem to be holding out with the vague 'I tried it and it came up with crap'. While many of us are shipping software without touching (much) code anymore. I have written code for over 40 years and this is nothing like no-code or whatever 'replacing programmers' before, this is clearly different judging from the people who cannot code with a gun to their heads but still are shipping apps: it does not really matter if anyone believes me or not. I am making more money than ever with fewer people than ever delivering more than ever.
  We are very close.
  (by the way; I like writing code and I still do for fun)
  [-]
  - utopiah 20 days ago
    Both can be correct : you might be making a lot of money using the latest tools while others who work on very different problems have tried the same tools and it's just not good enough for them.
    The ability to make money proves you found a good market, it doesn't prove that the new tools are useful to others.
    [-]
    - lostmsu 19 days ago
      No, the comment is about "will", not "is". Of course there's no definitive proof of what will happen. But the writing is on the wall and the letters are so large now, that denying AI would take over coding if not all intellectual endeavors resembles the movie "Don't look up".
    - int_19h 18 days ago
      It is also very much a moving target. A year ago I tried those tools and they were very meh at the kinds of stuff I do. Today, they are much better.
  - fc417fc802 20 days ago
    > holding out with the vague 'I tried it and it came up with crap'
    Isn't that a perfectly reasonable metric? The topic has been dominated by hype for at least the past 5 if not 10 years. So when you encounter the latest in a long line of "the future is here the sky is falling" claims, where every past claim to date has been wrong, it's natural to try for yourself, observe a poor result, and report back "nope, just more BS as usual".
    If the hyped future does ever arrive then anyone trying for themselves will get a workable result. It will be trivially easy to demonstrate that naysayers are full of shit. That does not currently appear to be the case.
    [-]
    - danielbln 20 days ago
      What topic are you referring to? ChatGPT release was just over 3 years ago. 5 years ago we had basic non-instruct GPT-3.
      [-]
      - fc417fc802 20 days ago
        Wasn't transformer 2017? There's been constant AI hype since at least that far back and it's only gotten worse.
        If I release a claim once a month that armageddon will happen next month, and then after 20 years it finally does, are all of my past claims vindicated? Or was I spewing nonsense the entire time? What if my claim was the next big pandemic? The next 9.0 earthquake?
        [-]
        danielbln 20 days ago
        Transformers was 2017 and it had some implications on translation (which were in no way overstated), but it took GPT-2 and 3 to kick it off in earnest and the real hype machine started with ChatGPT.
        What you are doing however is dismissing the outrageous progress on NLP and by extension code generation of the last few years just because people over hype it.
        People over hyped the Internet in the early 2000s, yet here we are.
        [-]
        fc417fc802 20 days ago
        Well I've been seeing an objectionable amount of what I consider to be hype since at least transformers.
        I never dismissed the actual verifiable progress that has occurred. I objected specifically to the hype. Are you sure you're arguing with what I actually said as opposed to some position that you've imagined that I hold?
        > People over hyped the Internet in the early 2000s, yet here we are.
        And? Did you not read the comment you are replying to? If I make wild predictions and they eventually pan out does that vindicate me? Or was I just spewing nonsense and things happened to work out?
        "LLMs will replace developers any day now" is such a claim. If it happens a month from now then you can say you were correct. If it doesn't then it was just hype and everyone forgets about it. Rinse and repeat once every few months and you have the current situation.
    - visarga 20 days ago
      But the trend line is less ambiguous, models got better year over year, much much better.
      [-]
      - fc417fc802 20 days ago
        I don't dispute that the situation is rapidly evolving. It is certainly possible that we could achieve AGI in the near future. It is also entirely possible that we might not. Claims such as that AGI is close or that we will soon be replacing developers entirely are pure hype.
        When someone says something to the effect of "LLMs are on the verge of replacing developers any day now" it is perfectly reasonable to respond "I tried it and it came up with crap". If we were actually near that point you wouldn't have gotten crap back when you tried it for yourself.
        [-]
        jerkstate 19 days ago
        There's a big difference between "I tried it and it produced crap" and "it will replace developers entirely any day now"
        People who use this stuff everyday know that people who are still saying "I tried it and it produced crap" just don't know how to use it correctly. Those developers WILL get replaced - by ones who know how to use the tool.
        [-]
        fc417fc802 19 days ago
        > Those developers WILL get replaced - by ones who know how to use the tool.
        Now _that_ I would believe. But note how different "those who fail to adapt to this new tool will be replaced" is from "the vast majority will be replaced by this tool itself".
        If someone had said that six (give or take) months ago I would have dismissed it as hype. But there have been at least a few decently well documented AI assisted projects done by veteran developers that have made the front page recently. Importantly they've shown clear and undeniable results as opposed to handwaving and empty aspirations. They've also been up front about the shortcomings of the new tool.
        [-]
        anonzzzies 18 days ago
        You probably mean antirez porting Flux to c. There were not too many shortcomings in his breakdown; his biggest one as I saw was that his knowledge and experience building large c programs really was a requirement. But given one of these experts, you don't see how that person and claude code just replaces a team. The less capable people on the team cannot do what he does so before they were just entering code and getting corrected in reviews or asking for help. Now the AI can do that, but on 10 projects in parallel. In a weekend you wont have time for that but not everything has to be done in a weekend.
- sekai 20 days ago
  > I have 15 years of software engineering experience across some top companies. I truly believe that ai will far surpass human beings at coding, and more broadly logic work. We are very close
  Coding was never the hard part of software development.
  [-]
  - pelorat 20 days ago
    Getting the architecture mostly right, so it's easy to maintain and modify in the future is IMO hard part, but I find that this is where AI shines. I have 20 years of SWE experience (professional) and (10 hobby) and most of my AI use is for architecture and scaffolding first, code second.
- 523-asf1 20 days ago
  Gotta make sure that the investors read this message in an Erdos thread.
- daxfohl 20 days ago
  They already do. What they suck at is common sense. Unfortunately good software requires both.
  [-]
  - anonzzzies 20 days ago
    [flagged]
    [-]
    - 523-asf1 20 days ago
      Even a 20 year old Markov chain could produce this banality.
  - marktl 20 days ago
    Or is it fortunate (for a short period at least).
- AtlasBarfed 20 days ago
  Is this comment written by AI?
- user3939382 20 days ago
  They can only code to specification which is where even teams of humans get lost. Without much smarter architecture for AI (LLMs as is are a joke) that needle isn’t going to move.
  [-]
  - danielbln 20 days ago
    Real HN comment right here. "LLMs are a joke" - maybe don't drink the anti-hype kool aid, you'll blind yourself to the capability space that's out there, even if it's not AGI or whatever.
    [-]
    - user3939382 19 days ago
      I’ll look past the disrespectful flippant insult on the hope that there’s a brain there too.
      They’re a probabalistic phonograph. They can sharpen the funnel for input but they can’t provide judgement on input or resolve ambiguities in your specifications. Teams of human requirements engineers cannot do it. LLMs are not magic. You’re essentially asking it; from my wardrobe pick an outfit for me and make sure it’s the one I would have picked.
      If you’re dazzled into thinking LLMs can solve this you just don’t understand transformer architecture and you don’t understand requirements engineering.
      You’ll know a proper AI engine when you see it and it doesn’t look like an LLM.
      [-]
      - int_19h 18 days ago
        Humans aren't magic either. LLMs don't need to be magic to be useful, or to replace humans for that matter.
        [-]
        user3939382 17 days ago
        Humans are magic from the LLMs perspective because the token window sizes they would need to approach human experiential disambiguation of requirements would be orders of magnitude larger. Useful in general or replace in general some human activities is a goal post shift that was never the discussion here.
syngrog66 19 days ago
I can post a long list of simple things a human can do accurately and efficiently that I've seen Gemini unable to do, repeatedly.
[-]
- thunky 19 days ago
  And someone could post an even longer list of things you can't do well. But what would be the point?
  The LLM did better on this problem than 100% of the haters in this thread could do, and who probably can't even begin "understand" the problem.
logicallee 20 days ago
how did they do it? Was a human using the chat interface? Did they just type out the problem and immediately on the first reply received a complete solution (one-shot) or what was the human's role? What was ChatGPT's thinking time?
[-]
- phelm 20 days ago
  Heres the chat https://chatgpt.com/share/696ac45b-70d8-8003-9ca4-320151e081...
  [-]
  - logicallee 20 days ago
    very interesting. ChatGPT reasoned for 41 minutes about it! Also, this was one-shot - i.e. ChatGPT produced its complete proof with a single prompt and no more replies by the human, (rather than a chat where the human further guided it.)
ironbound 19 days ago
Sounds like Lean 4/rocq did all the work here
[-]
- wasabi991011 19 days ago
  Why do you say that? I see no mention of lean/rocq on the twitter thread, nor on the erdos problem forum thread, nor on the chatGPT conversation.
supermatt 20 days ago
What does "solved with" mean? The author claims "I've solved", so did the author solve it or GPT?
[-]
- klohto 20 days ago
  When you use a calculator, did you really solve it or was it the calculator?
  [-]
  - supermatt 20 days ago
    With a calculator I supply the arithmetic. It just executes it with no reasoning so im the solver. I can do the same with an LLM and still be the solver as long as it just follows my direction. Or I can give it a problem and let it reason and generate the arithmetic itself, in which case the LLM is effectively the solver. Thats why saying "I've solved X using only GPT" is ambiguous.
    But thanks for the downvote in addition to your useless comment.
dernett 20 days ago
This is crazy. It's clear that these models don't have human intelligence, but it's undeniable at this point that they have _some_ form of intelligence.
[-]
- brendyn 20 days ago
  If LLMs weren't created by us but where something discovered in another species' behaviour it would be 100% labelled intelligence
  [-]
  - te0006 19 days ago
    Yes, same for the case where the technology would have been found embodied in machinery aboard a crashed UFO.
- qudat 20 days ago
  My take is that a huge part of human intelligence is pattern matching. We just didn’t understand how much multidimensional geometry influenced our matches
  [-]
  - keeda 20 days ago
    Yes, it could be that intelligence is essentially a sophisticated form of recursive, brute force pattern matching.
    I'm beginning to think the Bitter Lesson applies to organic intelligence as well, because basic pattern matching can be implemented relatively simply using very basic mathematical operations like multiply and accumulate, and so it can scale with massive parallelization of relatively simple building blocks.
    [-]
    - bob1029 20 days ago
      Intelligence is almost certainly a fundamentally recursive process.
      The ability to think about your own thinking over and over as deeply as needed is where all the magic happens. Counterfactual reasoning occurs every time you pop a mental stack frame. By augmenting our stack with external tools (paper, computers, etc.), we can extend this process as far as it needs to go.
      LLMs start to look a lot more capable when you put them into recursive loops with feedback from the environment. A trillion tokens worth of "what if..." can be expended without touching a single token in the caller's context. This can happen at every level as many times as needed if we're using proper recursive machinery. The theoretical scaling around this is extremely favorable.
      [-]
      - qudat 19 days ago
        Anatomically good candidate, the thalamal-cortical loop: https://en.wikipedia.org/wiki/Cortico-basal_ganglia-thalamo-...
  - sdwr 20 days ago
    I don't think it's accurate to describe LLMs as pattern matching. Prediction is the mechanism they use to ingest and output information, and they end up with a (relatively) deep model of the world under the hood.
    [-]
    - visarga 20 days ago
      The "pattern matching" perspective is true if you zoom in close enough, just like "protein reactions in water" is true for brains. But if you zoom out you see both humans and LLMs interact with external environments which provide opportunity for novel exploration. The true source of originality is not inside but in the environment. Making it be all about the model inside is a mistake, what matters more than the model is the data loop and solution space being explored.
    - D-Machine 20 days ago
      "Pattern matching" is not sufficiently specified here for us to say if LLMs do pattern matching or not. E.g. we can say that an LLM predicts the next token because that token (or rather, its embedding) is the best "match" to the previous tokens, which form a path ("pattern") in embedding space. In this sense LLMs are most definitely pattern matching. Under other formulations of the term, they may not be (e.g. when pattern matching refers to abstraction or abstracting to actual logical patterns, rather than strictly semantic patterns).
    - qudat 19 days ago
      > I don't think it's accurate to describe LLMs as pattern matching
      I’m talking about the inference step, which uses tensor geometry arithmetic to find patterns in text. We don’t understand what those patterns are but it’s clear it’s doing some heavy lifting since llm inference is expressing logic and reasoning under the guise of our reductive “next token prediction”
    - keeda 20 days ago
      Yes, the world model building is achieved via pattern matching and happens during ingestion and training, but that is also part of the intelligence.
    - DrewADesign 20 days ago
      Which is even more true for humans.
  - csomar 20 days ago
    Intelligence is hallucination that happens to produce useful results in the real world.
- threethirtytwo 20 days ago
  I don't think they will ever have human intelligence. It will always be an alien intelligence.
  But I think the trend line unmistakably points to a future where it can be MORE intelligent than a human in exactly the colloquial way we define "more intelligent"
  The fact that one of the greatest mathematicians alive has a page and is seriously bench marking this shows how likely he believes this can happen.
- eru 20 days ago
  Well, Alpha Go and Stockfish can beat you at their games. Why shouldn't these models beat us at math proofs?
  [-]
  - _fizz_buzz_ 20 days ago
    Chess and Go have very restrictive rules. It seems a lot more obvious to me why a computer can beat a human at it. They have a huge advantage just by being able to calculate very deep lines in a very short time. I actually find it impressive for how long humans were able to beat computers at go. Math proofs seem a lot more open ended to me.
  - thfuran 20 days ago
    Alpha go and stockfish were specifically designed and trained to win at those games.
    [-]
    - Davidzheng 20 days ago
      And we can train models specifically at math proofs? I think only difference is that math is bigger....
- ekianjo 20 days ago
  It's pattern matching. Which is actually what we measure in IQ tests, just saying.
  [-]
  - jadenpeterson 20 days ago
    There's some nuance. IQ tests measure pattern matching and, in an underlying way, other facets of intelligence - memory, for example. How well can an LLM 'remember' a thing? Sometimes Claude will perform compaction when its context window reaches 200k "tokens" then it seems a little colder to me, but maybe that's just my imagination. I'm kind of a "power user".
  - rurban 20 days ago
    I call it matching. Pattern matching had a different meaning.
    [-]
    - ekianjo 20 days ago
      what are you referring to? LLMs are neural networks at their core and the most simple versions of neural networks are all about reproducing patterns observed during training
      [-]
      - rurban 20 days ago
        You need to understand the difference between general matching and pattern matching. Maybe should have read more older AI books. A LLM is a general fuzzy matcher. A pattern matcher is an exact matcher using an abstract language, the "pattern". A general matcher uses a distance function instead, no pattern needed.
        Ie you want to find a subimage in a big image, possibly rotated, scaled, tilted, distorted, with noise. You cannot do that with a pattern matcher, but you can do that with a matcher, such as a fuzzy matcher, a LLM.
        You want to find a go position on a go board. A LLM is perfect for that, because you don't need to come up with a special language to describe go positions (older chess programs did that), you just train the model if that position is good or bad, and this can be fully automated via existing literature and later by playing against itself. You train the matcher not via patterns but a function (win or loose).
- altmanaltman 20 days ago
  Depends on what you mean by intelligence, human intelligence and human
- TZubiri 20 days ago
  As someone who doesn't understand this shit, and how it's always the experts who fiddle the LLMs to get good outputs, it feels natural to attribute the intelligence to the operator (or the training set), rather than the LLM itself.
- xyzsparetimexyz 19 days ago
  Yes it is intelligent, but so what? Its not conscious, sentient or sapient. It's a pattern matching chinese room.
magicalist 20 days ago
Funny seeing silicon valley bros commenting "you're on fire!" to Neel when it appears he copied and pasted the problem verbatim into chatGPT and it did literally all the other work here
https://chatgpt.com/share/696ac45b-70d8-8003-9ca4-320151e081...
[-]
- inimino 19 days ago
  Knowing which problem to copy and paste into the model is also a skill.
YouAreWRONGtoo 19 days ago
[dead]
ath3nd 19 days ago
[dead]
jrflowers 20 days ago
Narrator: The solution had already appeared several times in the training data
ares623 20 days ago
This must be what it feels like to be a CEO and someone tells me they solved coding.
beders 20 days ago
Has anyone confirmed the solution is not in the training data? Otherwise it is just a bit information retrieval LLM style. No intelligence necessary.
[-]
- ath3nd 19 days ago
  [dead]