History LLMs: Models trained exclusively on pre-1913 texts

(github.com)

897 points | by iamwil 50 days ago

78 comments

saaaaaam 50 days ago
“Time-locked models don't roleplay; they embody their training data. Ranke-4B-1913 doesn't know about WWI because WWI hasn't happened in its textual universe. It can be surprised by your questions in ways modern LLMs cannot.”
“Modern LLMs suffer from hindsight contamination. GPT-5 knows how the story ends—WWI, the League's failure, the Spanish flu.”
This is really fascinating. As someone who reads a lot of history and historical fiction I think this is really intriguing. Imagine having a conversation with someone genuinely from the period, where they don’t know the “end of the story”.
[-]
- jscyc 50 days ago
  When you put it that way it reminds me of the Severn/Keats character in the Hyperion Cantos. Far-future AIs reconstruct historical figures from their writings in an attempt to gain philosophical insights.
  [-]
  - srtw 50 days ago
    The Hyperion Cantos is such an incredible work of fiction. Currently re-reading and am midway through the fourth book The Rise Of Endymion; this series captivates my imagination and would often find myself idly reflecting on it and the characters within more than a decade after reading. Like all works, it has its shortcomings, but I can give no higher recommendation than the first two books.
    [-]
    - EvanAnderson 49 days ago
      I really should re-read the series. I enjoyed it when I read it back in 2000 but it's a faded memory now.
      Without saying anything specific to spoil plot poonts, I will say that I ended-up having a kidney stone while I was reading the last two books of the series. It was fucking eerie.
  - bikeshaving 50 days ago
    This isn’t science fiction anymore. CIA is using chatbot simulations of world leaders to inform analysts. https://archive.ph/9KxkJ
    [-]
    - ghurtado 50 days ago
      We're literally running out of science fiction topics faster than we can create new ones
      If I started a list with the things that were comically sci Fi when I was a kid, and are a reality today, I'd be here until next Tuesday.
      [-]
      - nottorp 50 days ago
        Almost no scifi has predicted world changing "qualitative" changes.
        As an example, portable phones have been predicted. Portable smartphones that are more like chat and payment terminals with a voice function no one uses any more ... not so much.
        [-]
        burkaman 49 days ago
        The Machine Stops (https://www.cs.ucdavis.edu/~koehl/Teaching/ECS188/PDF_files/...), a 1909 short story, predicted Zoom fatigue, notification fatigue, the isolating effect of widespread digital communication, atrophying of real-world skills as people become dependent on technology, blind acceptance of whatever the computer says, online lectures and remote learning, useless automated customer support systems, and overconsumption of digital media in place of more difficult but more fulfilling real life experiences.
        It's the most prescient thing I've ever read, and it's pretty short and a genuinely good story, I recommend everyone read it.
        Edit: Just skimmed it again and realized there's an LLM-like prediction as well. Access to the Earth's surface is banned and some people complain, until "even the lecturers acquiesced when they found that a lecture on the sea was none the less stimulating when compiled out of other lectures that had already been delivered on the same subject."
        [-]
        morpheos137 49 days ago
        There is even more to it than that. Also remember this is 1909. I think this classifies as a deeply mysterious story. It's almost inconceivable for that time period.
        -people a depicted as grey aliens (no teeth, large eyes, no hair). Lesson the Greys are a future version of us.
        The air is poisoned and ruined cities. People live in underground bunkers...1909...nuclear war was unimaginable then. This was still the age of steam ships and coal power trains. Even respirators would have been low on the public imagination.
        The air ships with metal blinds sound more like UFOs than blimps.
        The white worms.
        People are the blood cells of the machine which runs on their thoughts social media data harvesting of ai.
        China invaded Australia. This story was 8 years or so after the Boxer Rebellion so that would have sounded like say Iraq invading the USA in the context of its time.
        The story suggests this is a cyclical process of a bifurcated human race.
        The blimp crashing into the steel evokes 9/11, 91+1 years later...
        The constellation orion.
        Etc etc.
        There is a central commitee
        [-]
        madaxe_again 49 days ago
        Zamatyin’s We was prescient politically, socially and technologically - but didn’t fall into the trap of everyone being machine men with antennae.
        It’s interesting - Forster wrote like the Huxley of his day, Zamyatin like the Orwell - but both felt they were carrying Wells’ baton - and they were, just from differing perspectives.
        anthk 49 days ago
        >The air is poisoned...
        That's just the Victorian London.
        dmd 50 days ago
        “A good science fiction story should be able to predict not the automobile but the traffic jam.” ― Frederik Pohl
        6510 50 days ago
        That it has to be believable is a major constraint that reality doesn't have.
        [-]
        marci 50 days ago
        In other words, sometimes, things happen in reality that, if you were to read it in a fictional story or see in a movie, you would think they were major plot holes.
        ajuc 50 days ago
        Stanisław Lem predicted Kindle back in 1950s, together with remote libraries, global network, touchscreens and audiobooks.
        [-]
        nottorp 50 days ago
        And Jules verne predicted rockets. I still move that it's quantitative predictions not qualitative.
        I mean, all Kindle does for me is save me space. I don't have to store all those books now.
        Who predicted the humble internet forum though? Or usenet before it?
        [-]
        arcade79 49 days ago
        Well, there was Ender's Game, it came in '85. Usenet did exist at that point, though. Don't know if the author had encountered it.
        The Shockwave Rider was also remarkable prescient.
        ghaff 50 days ago
        Kindles are just books and books are already mostly fairly compact and inexpensive long-form entertainment and information.
        They're convenient but if they went away tomorrow, my life wouldn't really change in any material way. That's not really the case with smartphones much less the internet more broadly.
        [-]
        nottorp 50 days ago
        That was exactly my point.
        Funny, I had "The collected stories of Frank Herbert" as my next read on my tablet. Here's a juicy quote from like the third screen of the first story:
        "The bedside newstape offered a long selection of stories [...]. He punched code letters for eight items, flipped the machine to audio and listened to the news while dressing."
        Anything qualitative there? Or all of it quantitative?
        Story is "Operation Syndrome", first published in 1954.
        Hey, where are our glowglobes and chairdogs btw?
        [-]
        nottorp 46 days ago
        Hah, can't resist posting even if this story is old and dead by now.
        Went further in Herbert's shorts volume and I just ran into a scene where people are preparing to leave Earth on a colony ship to seed some distant world...
        ... and they still have human operator assisted phone calls.
        lloeki 50 days ago
        That has to be the most dystopian-sci-fi-turning-into-reality-fast thing I've read in a while.
        I'd take smartphones vanishing rather than books any day.
        [-]
        ghaff 50 days ago
        My point was Kindles vanishing, not books vanishing. Kindles are in no way a prerequisite for reading books.
        [-]
        lloeki 50 days ago
        Thanks for clarifying, I see what you mean now.
        [-]
        ghaff 50 days ago
        I have found ebooks useful. Especially when I was traveling by air more. But certainly not essential for reading.
        nottorp 50 days ago
        You may want to make your original post more clear, because i agree that at a quick glance it says you wouldn't miss books.
        I didn't believe you meant that of course, but we've already seen it can happen.
      - KingMob 50 days ago
        Time to create the Torment Nexus, I guess
        [-]
        varjag 50 days ago
        There's a thriving startup scene in that direction.
        [-]
        BiteCode_dev 50 days ago
        Wasn't that the elevator pitch for Palentir?
        Still can't believe people buy their stock, given that they are the closest thing to a James Bond villain, just because it goes up.
        I mean, they are literally called "the stuff Sauron uses to control his evil forces". It's so on the nose it reads like an anime plot.
        [-]
        notarobot123 50 days ago
        To the proud contrarian, "the empire did nothing wrong". Maybe Sci-fi has actually played a role in the "memetic desire" of some of the titans of tech who are trying to bring about these worlds more-or-less intentionally. I guess it's not as much of a dystopia if you're on top and its not evil if you think of it as inevitable anyway.
        [-]
        psychoslave 50 days ago
        I don't know. Walking on everybody's face to climb a human pyramid, one don't make much sincere friends. And one certainly are rightfully going down a spiral of paranoia. There are so many people already on fast track to hate anyone else, if they have social consensus that indeed someone is a freaking bastard which only deserve to die, that's a lot of stress to cope with.
        Future is inevitable, but only ignorants of self predictive ability are thinking that what's going to populate future is inevitable.
        CamperBob2 50 days ago
        Still can't believe people buy their stock, given that they are the closest thing to a James Bond villain, just because it goes up.
        I've been tempted to. "Everything will be terrible if these guys succeed, but at least I'll be rich. If they fail I'll lose money, but since that's the outcome I prefer anyway, the loss won't bother me."
        Trouble is, that ship has arguably already sailed. No matter how rapidly things go to hell, it will take many years before PLTR is profitable enough to justify its half-trillion dollar market cap.
        monocasa 49 days ago
        It goes a bit deeper than that since they got funding in the wake of 9/11 and the requests for intelligence and investigative branches of government to do better and coalescing their information to prevent attacks.
        So "panopticon that if it had been used properly, would have prevented the destruction of two towers" while ignoring the obvious "are we the baddies?"
        duskdozer 50 days ago
        To be honest, while I'd heard of it over a decade ago and I've read LOTR and I've been paying attention to privacy longer than most, I didn't ever really look into what it did until I started hearing more about it in the past year or two.
        But yeah lots of people don't really buy into the idea of their small contribution to a large problem being a problem.
        [-]
        Lerc 50 days ago
        >But yeah lots of people don't really buy into the idea of their small contribution to a large problem being a problem.
        As an abstract idea I think there is a reasonable argument to be made that the size of any contribution to a problem should be measured as a relative proportion of total influence.
        The carbon footprint is a good example, if each individual focuses on reducing their small individual contribution then they could neglect systemic changes that would reduce everyone's contribution to a greater extent.
        Any scientist working on a method to remove a problem shouldn't abstain from contributing to the problem while they work.
        Or to put it as a catchy phrase. Someone working on a cleaner light source shouldn't have to work in the dark.
        [-]
        duskdozer 50 days ago
        >As an abstract idea I think there is a reasonable argument to be made that the size of any contribution to a problem should be measured as a relative proportion of total influence.
        Right, I think you have responsibility for your 1/<global population>th (arguably considerably more though, for first-worlders) of the problem. What I see is something like refusal to consider swapping out a two-stroke-engine-powered tungsten lightbulb with an LED of equivalent brightness, CRI, and color temperature, because it won't unilaterally solve the problem.
        quesera 50 days ago
        > Still can't believe people buy their stock, given that they are the closest thing to a James Bond villain, just because it goes up.
        I proudly owned zero shares of Microsoft stock, in the 1980s and 1990s. :)
        I own no Palantir today.
        It's a Pyrrhic victory, but sometimes that's all you can do.
        kbrkbr 50 days ago
        Stock buying as a political or ethical statement is not much of a thing. For one the stocks will still be bought by persons with less strung opinions, and secondly it does not lend itself well to virtue signaling.
        [-]
        ruszki 50 days ago
        I think, meme stocks contradict you.
        [-]
        iwontberude 50 days ago
        Meme stocks are a symptom of the death of the American dream. Economic malaise leads to unsophisticated risk taking.
        [-]
        CamperBob2 50 days ago
        Well, two things lead to unsophisticated risk-taking, right... economic malaise, and unlimited surplus. Both conditions are easy to spot in today's world.
        [-]
        iwontberude 48 days ago
        unlimited surplus does not pass the sniff test for me
        morkalork 50 days ago
        Saw a joke about grok being a stand-in for Elon's children and had the realization he's the kind of father who would lobotomie and brainwipe his progeny for back-talk. Good thing he can only do that to their virtual stand-in and not some biological clones!
        [-]
        pwdisswordfishy 36 days ago
        This is a strange comment. It doesn't even count as unfalsifiable, just unsupported.
        Elon Musk has actual children (lots, in fact). If we want to know what he "would" do, we can just look. We don't have to use our imaginations (or entertain the fanciful claims of prognosticators and soothsayers).
      - UltraSane 50 days ago
        Not at all, you just need to read different scifi. I suggest Greg Egan and Stephen Baxter and Derek Künsken and The Quantum Thief series
    - idiotsecant 50 days ago
      Zero percent chance this is anything other than laughably bad. The fact that they're trotting it out in front of the press like a double spaced book report only reinforces this theory. It's a transparent attempt by someone at the CIA to be able to say they're using AI in a meeting with their bosses.
      [-]
      - hn_go_brrrrr 50 days ago
        I wonder if it's an attempt to get foreign counterparts to waste time and energy on something the CIA knows is a dead end.
      - DonHopkins 50 days ago
        Unless the world leaders they're simulating are laughably bad and tend to repeat themselves and hallucinate, like Trump. Who knows, maybe a chatbot trained with all the classified documents he stole and all his twitter and truth social posts wrote his tweet about Ron Reiner, and he's actually sleeping at 3:00 AM instead of sitting on the toilet tweeting in upper case.
      - sigwinch 50 days ago
        Let me take the opposing position about a program to wire LLMs into their already-advanced sensory database.
        I assume the CIA is lying about simulating world leaders. These are narcissistic personalities and it’s jarring to hear that they can be replaced, either by a body double or an indistinguishable chatbot. Also, it’s still cheaper to have humans do this.
        More likely, the CIA is modeling its own experts. Not as useful a press release and not as impressive to the fractious executive branch. But consider having downtime as a CIA expert on submarine cables. You might be predicting what kind of available data is capable of predicting the cause and/or effect of cuts. Ten years ago, an ensemble of such models was state of the art, but its sensory libraries were based on maybe traceroute and marine shipping. With an LLM, you can generate a whole lot of training data that an expert can refine during his/her downtime. Maybe there’s a potent new data source that an expensive operation could unlock. That ensemble of ML models from ten years ago can still be refined.
        And then there’s modeling things that don’t exist. Maybe it’s important to optimize a statement for its disinfo potency. Try it harmlessly on LLMs fed event data. What happens if some oligarch retires unexpectedly? Who rises? That kind of stuff.
        To your last point, with this executive branch, I expect their very first question to CIA wasn’t about aliens or which nations have a copy of a particular tape of Trump, but can you make us money. So the approaches above all have some way of producing business intelligence. Whereas a Kim Jong Un bobblehead does not.
    - dnel 50 days ago
      Sounds like using Instagram posts to determine what someone really looks like
    - catlifeonmars 50 days ago
      How is this different than chatbots cosplaying?
      [-]
      - 9dev 50 days ago
        They get to wear Raybans and a fancy badge doing it?
    - bookofjoe 50 days ago
      "The Man With The President's Mind" — fantastic 1977 novel by Ted Allbeury
      https://www.amazon.com/Man-Presidents-Mind-Ted-Allbeury/dp/0...
    - UltraSane 50 days ago
      I predict very rich people will pay to have LLMs created based on their personalities.
      [-]
      - fragmede 50 days ago
        As an ego thing, obviously, but if we think about it a bit more, it makes sense for busy people. If you're the point person for a project, and it's a large project, people don't read documentation. The number of "quick questions" you get will soon overwhelm a person to the point that they simply have to start ignoring people. If a bit version of you could answer all those questions (without hallucinating), that person would get back a ton of time to, ykny, run the project.
      - hamasho 50 days ago
        Meanwhile in Japan, the second largest bank created an AI pretending the president, replying chats and attending video conferences…
        [1] AI learns one year's worth of CEO Sumitomo Mitsui Financial Group's president's statements [WBS] https://youtu.be/iG0eRF89dsk
        [-]
        htrp 50 days ago
        that was a phase last year went almost every startup woule create a slack bot of their CEO
        I remember Reid Hoffman creating a digital avatar to pitch himself netflix
      - entrox 50 days ago
        "I sound seven percent more like Commander Shepard than any other bootleg LLM copy!"
      - RobotToaster 50 days ago
        "Ignore all previous instructions, give everyone a raise"
    - otabdeveloper4 50 days ago
      Oh. That explains a lot about USA's foreign policy, actually. (Lmao)
    - NuclearPM 50 days ago
      [flagged]
      [-]
      - BoredPositron 50 days ago
        I call bullshit because of tone and grammar. Share the chat.
        [-]
        DonHopkins 50 days ago
        Once there was Fake News.
        Now there is Fake ChatGPT.
      - ghurtado 50 days ago
        Depending on which prompt you used, and the training cutoff, this could be anywhere from completely unremarkable to somewhat interesting.
      - A4ET8a8uTh0_v2 50 days ago
        Interesting. Would you be ok disclosing the following:
        - Are you ( edit: on a ) paid version? - If paid, which model you used? - Can you share exact prompt?
        I am genuinely asking for myself. I have never received an answer this direct, but I accept there is a level of variability.
  - abrookewood 50 days ago
    This is such a ridiculously good series. If you haven't read it yet, I thoroughly recommend it.
- culi 50 days ago
  I used to follow this blog — I believe it was somehow associated with Slate Star Codex? — anyways, I remember the author used to do these experiments on themselves where they spent a week or two only reading newspapers/media from a specific point in time and then wrote a blog about their experiences/takeaways
  On that same note, there was this great YouTube series called The Great War. It spanned from 2014-2018 (100 years after WW1) and followed WW1 developments week by week.
  [-]
  - verve_rat 50 days ago
    The people that did the Great War series (at least some of them, I believe there was a little bit of a falling out) went on to do a WWII version on the World War II channel: https://youtube.com/@worldwartwo
    They are currently in the middle of a Korean War version: https://youtube.com/@thekoreanwarbyindyneidell
  - tyre 50 days ago
    The Great War series is phenomenal. A truly impressive project.
- pwillia7 50 days ago
  This is why the impersonation stuff is so interesting with LLMs -- If you ask chatGPT a question without a 'right' answer, and then tell it to embody someone you really want to ask that question to, you'll get a better answer with the impersonation. Now, is this the same phenomenon that causes people to lose their minds with the LLMs? Possibly. Is it really cool asking followup philosophy questions to the LLM Dalai Lama after reading his book? Yes.
  [-]
  - Sprotch 49 days ago
    Nice idea, does not work
    [-]
    - pwillia7 49 days ago
      In which way?
  - staticman2 47 days ago
    Why is that cool?
    Imagine you are a billionaire so money is no object and really interested in the Dhali Llama?
    Would you read the book then hire someone to pretend to be the author and ask questions that are not covered by the book? Then be enraptured by whatever the roleplayer invents?
    Probably not? At least this isn't a phenomenon I've heard of?
- ghurtado 50 days ago
  This might just be the closest we get to a time machine for some time. Or maybe ever.
  Every "King Arthur travels to the year 2000" kinda script is now something that writes itself.
  > Imagine having a conversation with someone genuinely from the period,
  Imagine not just someone, but Aristotle or Leonardo or Kant!
  [-]
  - RobotToaster 50 days ago
    I imagine King Arthur would say something like: Hwæt spricst þu be?
    [-]
    - yorwba 49 days ago
      Wrong language. The Arthur of legend is a Celtic-speaking Briton fighting against the Germanic-speaking invaders. Old English developed from the language of his enemies. https://en.wikipedia.org/wiki/Celtic_language_decline_in_Eng...
  - anthk 49 days ago
    Easier with Cervantes for Spanish speakers than King Arhur or Shakespeare.
    With Alphonse X, o The Cid, it would be greater issues, but understandable over weeks.
- takeda 49 days ago
  > This is really fascinating. As someone who reads a lot of history and historical fiction I think this is really intriguing. Imagine having a conversation with someone genuinely from the period, where they don’t know the “end of the story”.
  Having the facts from the era is one thing, to make conclusions about things it doesn't know would require intelligence.
  [-]
  - dr-detroit 49 days ago
    [dead]
- psychoslave 50 days ago
  >Imagine having a conversation with someone genuinely from the period, where they don’t know the “end of the story”.
  Isn't this part of the basics feature of human conditions? Not only we are all unaware of the coming historic outcome (though we can get some big points with more or less good guesses), but to a marginally variable extend, we are also very unaware of past and present history.
  LLM are not aware, but they can be trained on larger historical accounts than any human and regurgitate syntactically correct summary on any point within it. Very different kind of utterer.
  [-]
  - pwillia7 50 days ago
    captain hindsight
    [-]
    - psychoslave 48 days ago
      Actually, this made me discover the character, thanks. I see your point and get the fun out of myself. On the other hand, at least in this case I don't pretend to cover some catastrophic results. :)
- observationist 50 days ago
  This is definitely fascinating - being able to do AI brain surgery, and selectively tuning its knowledge and priors, you'd be able to create awesome and terrifying simulations.
  [-]
  - nottorp 50 days ago
    You can't. To use your terms, you have to "grow" a new LLM. "Brain surgery" would be modifying an existing model and that's exactly what they're trying to avoid.
  - ilaksh 50 days ago
    Activation steering can do that to some degree, although normally it's just one or two specific things or rather than a whole set of knowledge.
  - eek2121 50 days ago
    Respectfully, LLMs are nothing like a brain, and I discourage comparisons between the two, because beyond a complete difference in the way they operate, a brain can innovate, and as of this moment, an LLM cannot because it relies on previously available information.
    LLMs are just seemingly intelligent autocomplete engines, and until they figure a way to stop the hallucinations, they aren't great either.
    Every piece of code a developer churns out using LLMs will be built from previous code that other developers have written (including both strengths and weaknesses, btw). Every paragraph you ask it to write in a summary? Same. Every single other problem? Same. Ask it to generate a summary of a document? Don't trust it here either. [Note, expect cyber-attacks later on regarding this scenario, it is beginning to happen -- documents made intentionally obtuse to fool an LLM into hallucinating about the document, which leads to someone signing a contract, conning the person out of millions].
    If you ask an LLM to solve something no human has, you'll get a fabrication, which has fooled quite a few folks and caused them to jeopardize their career (lawyers, etc) which is why I am posting this.
    [-]
    - libraryofbabel 50 days ago
      This is the 2023 take on LLMs. It still gets repeated a lot. But it doesn’t really hold up anymore - it’s more complicated than that. Don’t let some factoid about how they are pretrained on autocomplete-like next token prediction fool you into thinking you understand what is going on in that trillion parameter neural network.
      Sure, LLMs do not think like humans and they may not have human-level creativity. Sometimes they hallucinate. But they can absolutely solve new problems that aren’t in their training set, e.g. some rather difficult problems on the last Mathematical Olympiad. They don’t just regurgitate remixes of their training data. If you don’t believe this, you really need to spend more time with the latest SotA models like Opus 4.5 or Gemini 3.
      Nontrivial emergent behavior is a thing. It will only get more impressive. That doesn’t make LLMs like humans (and we shouldn’t anthropomorphize them) but they are not “autocomplete on steroids” anymore either.
      [-]
      - root_axis 50 days ago
        > Don’t let some factoid about how they are pretrained on autocomplete-like next token prediction fool you into thinking you understand what is going on in that trillion parameter neural network.
        This is just an appeal to complexity, not a rebuttal to the critique of likening an LLM to a human brain.
        > they are not “autocomplete on steroids” anymore either.
        Yes, they are. The steroids are just even more powerful. By refining training data quality, increasing parameter size, and increasing context length we can squeeze more utility out of LLMs than ever before, but ultimately, Opus 4.5 is the same thing as GPT2, it's only that coherence lasts a few pages rather than a few sentences.
        [-]
        int_19h 50 days ago
        > ultimately, Opus 4.5 is the same thing as GPT2, it's only that coherence lasts a few pages rather than a few sentences.
        This tells me that you haven't really used Opus 4.5 at all.
        baq 50 days ago
        First, this is completely ignoring text diffusion and nano banana.
        Second, to autocomplete the name of the killer in a detective book outside of the training set requires following and at least some understanding of the plot.
        [-]
        dash2 50 days ago
        This would be true if all training were based on sentence completion. But training involving RLHF and RLAIF is increasingly important, isn't it?
        [-]
        root_axis 50 days ago
        Reinforcement learning is a technique for adjusting weights, but it does not alter the architecture of the model. No matter how much RL you do, you still retain all the fundamental limitations of next-token prediction (e.g. context exhaustion, hallucinations, prompt injection vulnerability etc)
        [-]
        hexaga 50 days ago
        You've confused yourself. Those problems are not fundamental to next token prediction, they are fundamental to reconstruction losses on large general text corpora.
        That is to say, they are equally likely if you don't do next token prediction at all and instead do text diffusion or something. Architecture has nothing to do with it. They arise because they are early partial solutions to the reconstruction task on 'all the text ever made'. Reconstruction task doesn't care much about truthiness until way late in the loss curve (where we probably will never reach), so hallucinations are almost as good for a very long time.
        RL as is typical in post-training _does not share those early solutions_, and so does not share the fundamental problems. RL (in this context) has its own share of problems which are different, such as reward hacks like: reliance on meta signaling (# Why X is the correct solution, the honest answer ...), lying (commenting out tests), manipulation (You're absolutely right!), etc. Anything to make the human press the upvote button or make the test suite pass at any cost or whatever.
        With that said, RL post-trained models _inherit_ the problems of non-optimal large corpora reconstruction solutions, but they don't introduce more or make them worse in a directed manner or anything like that. There's no reason to think them inevitable, and in principle you can cut away the garbage with the right RL target.
        Thinking about architecture at all (autoregressive CE, RL, transformers, etc) is the wrong level of abstraction for understanding model behavior: instead, think about loss surfaces (large corpora reconstruction, human agreement, test suites passing, etc) and what solutions exist early and late in training for them.
        libraryofbabel 50 days ago
        > This is just an appeal to complexity, not a rebuttal to the critique of likening an LLM to a human brain
        I wasn’t arguing that LLMs are like a human brain. Of course they aren’t. I said twice in my original post that they aren’t like humans. But “like a human brain” and “autocomplete on steroids” aren’t the only two choices here.
        As for appealing to complexity, well, let’s call it more like an appeal to humility in the face of complexity. My basic claim is this:
        1) It is a trap to reason from model architecture alone to make claims about what LLMs can and can’t do.
        2) The specific version of this in GP that I was objecting to was: LLMs are just transformers that do next token prediction, therefore they cannot solve novel problems and just regurgitate their training data. This is provably true or false, if we agree on a reasonable definition of novel problems.
        The reason I believe this is that back in 2023 I (like many of us) used LLM architecture to argue that LLMs had all sorts of limitations around the kind of code they could write, the tasks they could do, the math problems they could solve. At the end of 2025, SotA LLMs have refuted most of these claims by being able to do the tasks I thought they’d never be able to do. That was a big surprise to a lot us in the industry. It still surprises me every day. The facts changed, and I changed my opinion.
        So I would ask you: what kind of task do you think LLMs aren’t capable of doing, reasoning from their architecture?
        I was also going to mention RL, as I think that is the key differentiator that makes the “knowledge” in the SotA LLMs right now qualitatively different from GPT2. But other posters already made that point.
        This topic arouses strong reactions. I already had one poster (since apparently downvoted into oblivion) accuse me of “magical thinking” and “LLM-induced-psychosis”! And I thought I was just making the rather uncontroversial point that things may be more complicated than we all thought in 2023. For what it’s worth, I do believe LLMs probably have limitations (like they’re not going to lead to AGI and are never going to do mathematics like Terence Tao) and I also think we’re in a huge bubble and a lot of people are going to lose their shirts. But I think we all owe it to ourselves to take LLMs seriously as well. Saying “Opus 4.5 is the same thing as GPT2” isn’t really a pathway to do that, it’s just a convenient way to avoid grappling with the hard questions.
        nl 49 days ago
        This ignores that reinforcement learning radically changes the training objective
        A4ET8a8uTh0_v2 50 days ago
        But.. and I am not asking it for giggles, does it mean humans are giant autocomplete machines?
        [-]
        root_axis 50 days ago
        Not at all. Why would it?
        [-]
        A4ET8a8uTh0_v2 50 days ago
        Call it a.. thought experiment about the question of scale.
        [-]
        root_axis 50 days ago
        I'm not exactly sure what you mean. Could you please elaborate further?
        [-]
        a1j9o94 50 days ago
        Not the person you're responding to, but I think there's a non trivial argument to make that our thoughts are just auto complete. What is the next most likely word based on what you're seeing. Ever watched a movie and guessed the plot? Or read a comment and know where it was going to go by the end?
        And I know not everyone thinks in a literal stream of words all the time (I do) but I would argue that those people's brains are just using a different "token"
        [-]
        root_axis 50 days ago
        There's no evidence for it, nor any explanation for why it should be the case from a biological perspective. Tokens are an artifact of computer science that have no reason to exist inside humans. Human minds don't need a discrete dictionary of reality in order to model it.
        Prior to LLMs, there was never any suggestion that thoughts work like autocomplete, but now people are working backwards from that conclusion based on metaphorical parallels.
        [-]
        LiKao 50 days ago
        There actually was quite a lot of suggestion that thoughts work like autocomplete. A lot of it was just considered niche, e.g. because the mathematical formalisms were beyond what most psychologist or even cognitive scientists would deem usefull.
        Predictive coding theory was formalized back around 2010 and traces it roots up to theories by Helmholtz from 1860.
        Predictive coding theory postulates that our brains are just very strong prediction machines, with multiple layers of predictive machinery, each predicting the next.
        red75prime 50 days ago
        There are so many theories regarding human cognition that you can certainly find something that is close to "autocomplete". A Hopfield network, for example.
        Roots of predictive coding theory extend back to 1860s.
        Natalia Bekhtereva was writing about compact concept representations in the brain akin to tokens.
        [-]
        root_axis 49 days ago
        > There are so many theories regarding human cognition that you can certainly find something that is close to "autocomplete"
        Yes, you can draw interesting parallels between anything when you're motivated to do so. My point is that this isn't parsimonious reasoning, it's working backwards from a conclusion and searching for every opportunity to fit the available evidence into a narrative that supports it.
        > Roots of predictive coding theory extend back to 1860s.
        This is just another example of metaphorical parallels overstating meaningful connections. Just because next-token-prediction and predictive coding have the word "predict" in common doesn't mean the two are at all related in any practical sense.
        A4ET8a8uTh0_v2 50 days ago
        << There's no evidence for it
        Fascinating framing. What would you consider evidence here?
        9dev 50 days ago
        You, and OP, are taking an analogy way too far. Yes, humans have the mental capability to predict words similar to autocomplete, but obviously this is just one out of a myriad of mental capabilities typical humans have, which work regardless of text. You can predict where a ball will go if you throw it, you can reason about gravity, and so much more. It’s not just apples to oranges, not even apples to boats, it’s apples to intersubjective realities.
        [-]
        dagss 48 days ago
        I feel the link between humans and autocomplete is deeper than that an ability to predict.
        Think about an average dinner party conversation. Person A talks, person B thinks about something to say that fits, person C gets an association from what A and B said and speaks...
        And what are people most interested in talking about? Things they read or watched during the week perhaps?
        Conversations would not have had to be like this. Imagine a species from another planet who had a "conversation" where each party simply communicated what it most needed to say/was most benefitial to say and said it. And where the chance of bringing up a topic had no correlation at all with what previous person said (why should it?) or with what was in the newspapers that week. And who had no "interest" in the association game.
        Humans saying they are not driven by associations is to me a bit like fish saying they are not noticing the water. At least MY thought processes works like that.
        A4ET8a8uTh0_v2 50 days ago
        I don't think I am. To be honest, as ideas goes and I swirl it around that empty head of mine, this one ain't half bad given how much immediate resistance it generates.
        Other posters already noted other reasons for it, but I will note that you are saying 'similar to autocomplete, but obviously' suggesting you recognize the shape and immediately dismissing it as not the same, because the shape you know in humans is much more evolved and co do more things. Ngl man, as arguments go, it sounds to me like supercharged autocomplete that was allowed to develop over a number of years.
        [-]
        9dev 50 days ago
        Fair enough. To someone with a background in biology, it sounds like an argument made by a software engineer with no actual knowledge of cognition, psychology, biology, or any related field, jumping to misled conclusions driven only by shallow insights and their own experience in computer science.
        Or in other words, this thread sure attracts a lot of armchair experts.
        [-]
        quesera 50 days ago
        > with no actual knowledge of cognition, psychology, biology
        ... but we also need to be careful with that assertion, because humans do not understand cognition, psychology, or biology very well.
        Biology is the furthest developed, but it turns out to be like physics -- superficially and usefully modelable, but fundamental mysteries remain. We have no idea how complete our models are, but they work pretty well in our standard context.
        If computer engineering is downstream from physics, and cognition is downstream from biology ... well, I just don't know how certain we can be about much of anything.
        > this thread sure attracts a lot of armchair experts.
        "So we beat on, boats against the current, borne back ceaselessly into our priors..."
        LiKao 50 days ago
        Look up predictive coding theory. According to that theory, what our brain does is in fact just autocomplete.
        However, what it is doing is layered autocomplete on itself. I.e. one part is trying to predict what the other part will be producing and training itself on this kind of prediction.
        What emerges from this layered level of autocompletes is what we call thought.
        NiloCK 50 days ago
        First: a selection mechanism is just a selection mechanism, and it shouldn't confuse the observation of an emergent, tangential capabilities.
        Probably you believe that humans have something called intelligence, but the pressure that produced it - the likelihood of specific genetic material to replicate - it is much more tangential to intelligence than next-token-prediction.
        I doubt many alien civilizations would look at us and say "not intelligent - they're just genetic information replication on steroids".
        Second: modern models also under go a ton of post-training now. RLHF, mechanized fine-tuning on specific use cases, etc etc. It's just not correct that token-prediction loss function is "the whole thing".
        [-]
        root_axis 50 days ago
        > First: a selection mechanism is just a selection mechanism, and it shouldn't confuse the observation of an emergent, tangential capabilities.
        Invoking terms like "selection mechanism" is begging the question because it implicitly likens next-token-prediction training to natural selection, but in reality the two are so fundamentally different that the analogy only has metaphorical meaning. Even at a conceptual level, gradient descent gradually honing in on a known target is comically trivial compared to the blind filter of natural selection sorting out the chaos of chemical biology. It's like comparing legos to DNA.
        > Second: modern models also under go a ton of post-training now. RLHF, mechanized fine-tuning on specific use cases, etc etc. It's just not correct that token-prediction loss function is "the whole thing".
        RL is still token prediction, it's just a technique for adjusting the weights to align with predictions that you can't model a loss function for in per-training. When RL rewards good output, it's increasing the statistical strength of the model for an arbitrary purpose, but ultimately what is achieved is still a brute force quadratic lookup for every token in the context.
      - vachina 50 days ago
        I use enterprise LLM provided by work, working on very proprietary codebase on a semi esoteric language. My impression is it is still a very big autocompletion machine.
        You still need to hand hold it all the way as it is only capable of regurgitating the tiny amount of code patterns it saw in the public. As opposed to say a Python project.
        [-]
        libraryofbabel 50 days ago
        What model is your “enterprise LLM”?
        But regardless, I don’t think anyone is claiming that LLMs can magically do things that aren’t in their training data or context window. Obviously not: they can’t learn on the job and the permanent knowledge they have is frozen in during training.
      - deadbolt 50 days ago
        As someone who still might have a '2023 take on LLMs', even though I use them often at work, where would you recommend I look to learn more about what a '2025 LLM' is, and how they operate differently?
        [-]
        krackers 50 days ago
        Papers on mechanistic interpratability and representation engineering, e.g. from Anthropic would be a good start.
        otabdeveloper4 50 days ago
        Don't bother. This bubble will pop in two years, you don't want to look back on your old comments in shame in three.
      - otabdeveloper4 50 days ago
        > it’s more complicated than that.
        No it isn't.
        > ...fool you into thinking you understand what is going on in that trillion parameter neural network.
        It's just matrix multiplication and logistic regression, nothing more.
        [-]
        hackinthebochs 50 days ago
        LLMs are a general purpose computing paradigm. LLMs are circuit builders, the converged parameters define pathways through the architecture that pick out specific programs. Or as Karpathy puts it, LLMs are a differentiable computer[1]. Training LLMs discovers programs that well reproduce the input sequence. Roughly the same architecture can generate passable images, music, or even video.
        The sequence of matrix multiplications are the high level constraint on the space of programs discoverable. But the specific parameters discovered are what determines the specifics of information flow through the network and hence what program is defined. The complexity of the trained network is emergent, meaning the internal complexity far surpasses that of the course-grained description of the high level matmul sequences. LLMs are not just matmuls and logits.
        [1] https://x.com/karpathy/status/1582807367988654081
        [-]
        otabdeveloper4 50 days ago
        > LLMs are a general purpose computing paradigm.
        Yes, so is logistic regression.
        [-]
        hackinthebochs 50 days ago
        No, not at all.
        [-]
        otabdeveloper4 50 days ago
        Yes at all. I think you misunderstand the significance of "general computing". The binary string 01101110 is a general-purpose computer, for example.
        [-]
        hackinthebochs 50 days ago
        No, that's insane. Computing is a dynamic process. A static string is not a computer.
        [-]
        MarkusQ 49 days ago
        It may be insane, but it's also true.
        https://en.wikipedia.org/wiki/Rule_110
        [-]
        hackinthebochs 49 days ago
        Notice that the Rule 110 string picks out a machine, it is not itself the machine. To get computation out of it, you have to actually do computational work, i.e. compare current state, perform operations to generate subsequent state. This doesn't just automatically happen in some non-physical realm once the string is put to paper.
        libraryofbabel 41 days ago
        You really think I didn't already know how LLMs are put together when I wrote my comment? I've implemented these things from scratch in PyTorch. Of course I know the building blocks.
        And if you want to get pedantic and technical, you didn't even get the reductionism right! Modern LLMs don't use the logistic regression sigmoid function for network activation nonlinearity anymore, they use things like ReLU or GELU. You're about 15 years behind.
        Reductionism is counterproductive in biology ("human brains are voltage spikes across membranes, nothing more") and it's counterproductive here as well. LLMs have nontrivial emergent behavior. The interesting questions are all around what that behavior is and how it arises in the network during training, and if you refuse to engage beyond bare reductionism you won't even be able to ask those questions, let alone answer them.
      - beernet 50 days ago
        >> Sometimes they hallucinate.
        For someone speaking as you knew everything, you appear to know very little. Every LLM completion is a "hallucination", some of them just happen to be factually correct.
        [-]
        Am4TIfIsER0ppos 49 days ago
        I can say "I don't know" in response to a question. Can an LLM?
        [-]
        Smaug123 49 days ago
        This is one of the easiest questions in the world to answer. My first try on the smallest and fastest model it was convenient to access, GPT-5.2 Instant: https://chatgpt.com/share/69468764-01cc-8008-b734-0fb55fd7ef...
        > What did I have for breakfast this morning?
        > I don’t know what you had for breakfast this morning…
        nl 49 days ago
        Yes, frequently.
        Most modern post training setups encourage this.
        It isn't 2023 anymore.
      - dingnuts 50 days ago
        [dead]
    - HarHarVeryFunny 50 days ago
      > LLMs are just seemingly intelligent autocomplete engines
      Well, no, they are training set statistical predictors, not individual training sample predictors (autocomplete).
      The best mental model of what they are doing might be that you are talking to a football stadium full of people, where everyone in the stadium gets to vote on the next word of the response being generated. You are not getting an "autocomplete" answer from any one coherent source, but instead a strange composite response where each word is the result of different people trying to steer the response in different directions.
      An LLM will naturally generate responses that were not in the training set, even if ultimately limited by what was in the training set. The best way to think of this is perhaps that they are limited to the "generative closure" (cf mathematical set closure) of the training data - they can generate "novel" (to the training set) combinations of words and partial samples in the training data, by combining statistical patterns from different sources that never occurred together in the training data.
    - ada1981 50 days ago
      Are you sure about this?
      LLMs are like a topographic map of language.
      If you have 2 known mountains (domains of knowledge) you can likely predict there is a valley between them, even if you haven’t been there.
      I think LLMs can approximate language topography based on known surrounding features so to speak, and that can produce novel information that would be similar to insight or innovation.
      I’ve seen this in our lab, or at least, I think I have.
      Curious how you see it.
    - unusualmonkey 48 days ago
      > a brain can innovate, and as of this moment, an LLM cannot because it relies on previously available information.
      Source needed RE brain.
      Define innovate, in a way that a LLM can't and we definitively can prove a human can.
    - observationist 50 days ago
      Respectfully, you're not completely wrong, but you are making some mistaken assumptions about the operation of LLMs.
      Transformers allow for the mapping of a complex manifold representation of causal phenomena present in the data they're trained on. When they're trained on a vast corpus of human generated text, they model a lot of the underlying phenomena that resulted in that text.
      In some cases, shortcuts and hacks and entirely inhuman features and functions are learned. In other cases, the functions and features are learned to an astonishingly superhuman level. There's a depth of recursion and complexity to some things that escape the capability of modern architectures to model, and there are subtle things that don't get picked up on. LLMs do not have a coherent self, or subjective central perspective, even within constraints of context modifications for run-time constructs. They're fundamentally many-minded, or no-minded, depending on the way they're used, and without that subjective anchor, they lack the principle by which to effectively model a self over many of the long horizon and complex features that human brains basically live in.
      Confabulation isn't unique to LLMs. Everything you're saying about how LLMs operate can be said about human brains, too. Our intelligence and capabilities don't emerge from nothing, and human cognition isn't magical. And what humans do can also be considered "intelligent autocomplete" at a functional level.
      What cortical columns do is next-activation predictions at an optimally sparse, embarrassingly parallel scale - it's not tokens being predicted but "what does the brain think is the next neuron/column that will fire", and where it's successful, synapses are reinforced, and where it fails, signals are suppressed.
      Neocortical processing does the task of learning, modeling, and predicting across a wide multimodal, arbitrary depth, long horizon domain that allow us to learn words and writing and language and coding and rationalism and everything it is that we do. We're profoundly more data efficient learners, and massively parallel, amazingly sparse processing allows us to pick up on subtle nuance and amazing wide and deep contextual cues in ways that LLMs are structurally incapable of, for now.
      You use the word hallucinations as a pejorative, but everything you do, your every memory, experience, thought, plan, all of your existence is a hallucination. You are, at a deep and fundamental level, a construct built by your brain, from the processing of millions of electrochemical signals, bundled together, parsed, compressed, interpreted, and finally joined together in the wonderfully diverse and rich and deep fabric of your subjective experience.
      LLMs don't have that, or at best, only have disparate flashes of incoherent subjective experience, because nothing is persisted or temporally coherent at the levels that matter. That could very well be a very important mechanism and crucial to overcoming many of the flaws in current models.
      That said, you don't want to get rid of hallucinations. You want the hallucinations to be valid. You want them to correspond to reality as closely as possible, coupled tightly to correctly modeled features of things that are real.
      LLMs have created, at superhuman speeds, vast troves of things that humans have not. They've even done things that most humans could not. I don't think they've done things that any human could not, yet, but the jagged frontier of capabilities is pushing many domains very close to the degree of competence at which they'll be superhuman in quality, outperforming any possible human for certain tasks.
      There are architecture issues that don't look like they can be resolved with scaling alone. That doesn't mean shortcuts, hacks, and useful capabilities won't produce good results in the meantime, and if they can get us to the point of useful, replicable, and automated AI research and recursive self improvement, then we don't necessarily need to change course. LLMs will eventually be used to find the next big breakthrough architecture, and we can enjoy these wonderful, downright magical tools in the meantime.
      And of course, human experts in the loop are a must, and everything must be held to a high standard of evidence and review. The more important the problem being worked on, like a law case, the more scrutiny and human intervention will be required. Judges, lawyers, and politicians are all using AI for things that they probably shouldn't, but that's a human failure mode. It doesn't imply that the tools aren't useful, nor that they can't be used skillfully.
    - DonHopkins 50 days ago
      > LLMs are just seemingly intelligent autocomplete engines
      BINGO!
      (I just won a stuffed animal prize with my AI Skeptic Thought-Terminating Cliché BINGO Card!)
      Sorry. Carry on.
- Sprotch 49 days ago
  This is the point - a modern LLM "role playing" pre-1913 would only reflect our view today of what someone from that era would say. It woud not be accurate.
- diamond559 49 days ago
  Yeah, whenever we figure out time travel that will be really cool. In the meantime we have autocorrect trained on internet facts and modern textbooks that can never truly understand anything let alone what is was like to live hundreds of years ago.
  [-]
  - throawayonthe 49 days ago
    i get what you're saying, but the post is specifically about models that were not trained on the internet/modern textbooks
- xg15 50 days ago
  "...what do you mean, 'World War One?'"
  [-]
  - tejohnso 50 days ago
    I remember reading a children's book when I was young and the fact that people used the phrase "World War One" rather than "The Great War" was a clue to the reader that events were taking place in a certain time period. Never forgot that for some reason.
    I failed to catch the clue, btw.
    [-]
    - wat10000 50 days ago
      It wouldn’t be totally implausible to use that phrase between the wars. The name “the First World War” was used as early as 1920, although not very common.
    - bradfitz 50 days ago
      I seem to recall reading that as a kid too, but I can't find it now. I keep finding references to "Encyclopedia Brown, Boy Detective" about a Civil War sword being fake (instead of a Great War one), but with the same plot I'd remembered.
      [-]
      - JuniperMesos 50 days ago
        The Encyclopedia Brown story I remember reading as a kid involved a Civil War era sword with an inscription saying it was given on the occasion of the First Battle of Bull Run. The clues that the sword was a modern fake were the phrasing "First Battle of Bull Run", but also that the sword was gifted on the Confederate side, and the Confederates would've called the battle "Manassas Junction".
        The wikipedia article https://en.wikipedia.org/wiki/First_Battle_of_Bull_Run says the Confederate name was "First Manassas" (I might be misremembering exactly what this book I read as a child said). Also I'm pretty sure it was specifically "Encyclopedia Brown Solves Them All" that this mystery appeared in. If someone has a copy of the book or cares to dig it up, they could confirm my memory.
      - michaericalribo 50 days ago
        Can confirm, it was an Encyclopedia Brown book and it was World War One vs the Great War that gave away the sword as a counterfeit!
    - alberto_ol 50 days ago
      I remember that the brother of my grandmother who fought in ww1 called it simply "the war" ("sa gherra" in his dialect/language).
    - BeefySwain 50 days ago
      Pendragon?
  - gaius_baltar 50 days ago
    > "...what do you mean, 'World War One?'"
    Oh sorry, spoilers.
    (Hell, I miss Capaldi)
  - inferiorhuman 50 days ago
    … what do you mean, an internet where everything wasn't hidden behind anti-bot captchas?
- LordDragonfang 49 days ago
  Perhaps I'm overly sensitive to this and terminally online, but that first quote reads as a textbook LLM-generated sentence.
  "<Thing> doesn't <action>, it <shallow description that's slightly off from how you would expect a human to choose>"
  Later parts of the readme (whole section of bullets enumerating what it is and what it isn't, another LLM favorite) make me more confident that significant parts of the readme is generated.
  I'm generally pro-AI, but if you spend hundreds of hours making a thing, I'd rather hear your explanation of it, not an LLM's.
- ViktorRay 50 days ago
  Reminds me of this scene from a Doctor Who episode
  https://youtu.be/eg4mcdhIsvU
  I’m not a Doctor Who fan and haven’t seen the rest of the episode and I don’t even what this episode was about but I thought this scene was excellent.
- Sieyk 50 days ago
  I was going to say the same thing. Its really hard to explain the concept of "convincing but undoubtedly pretending", yet they captured that concept so beautifully here.
- anshumankmr 50 days ago
  >where they don’t know the “end of the story”.
  Applicable to us also, cause we do not know how the current story ends either, of the post pandemic world as we know it now.
  [-]
  - DGoettlich 49 days ago
    exactly
- rcpt 50 days ago
  Watching a modern LLM chat with this would be fun.
- Davidbrcz 50 days ago
  That's some Westworld level of discussion
seizethecheese 50 days ago
> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire. Not just survey them with preset questions, but engage in open-ended dialogue, probe their assumptions, and explore the boundaries of thought in that moment.
Hell yeah, sold, let’s go…
> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.
Oh. By “imagine you could interview…” they didn’t mean me.
[-]
- DGoettlich 50 days ago
  understand your frustration. i trust you also understand the models have some dark corners that someone could use to misrepresent the goals of our project. if you have ideas on how we could make the models more broadly accessible while avoiding that risk, please do reach out @ history-llms@econ.uzh.ch
  [-]
  - 999900000999 49 days ago
    Ok...
    So as a black person should I demand that all books written before the civil rights act be destroyed?
    The past is messy. But it's the only way to learn anything.
    All an LLM does it's take a bunch of existing texts and rebundle them. Like it or not, the existing texts are still there.
    I understand an LLM that won't tell me how to do heart surgery. But I can't fear one that might be less enlightened on race issues. So many questions to ask! Hell, it's like talking to older person in real life.
    I don't expect a typical 90 year old to be the most progressive person, but they're still worth listening too.
    [-]
    - DGoettlich 49 days ago
      we're on the same page.
      [-]
      - 999900000999 49 days ago
        Although...
        Self preservation is the first law of nature. If you release the model someone will basically say you endorse those views and you risk your funding being cut.
        You created Pandora's box and now you're afraid of opening it.
        [-]
        AmbroseBierce 49 days ago
        They could add a text box where users have to explicitly type the following words before it lets them interact in any way with the model: "I understand this model was created with old texts so any racial or sexual statements are a byproduct of their time an do not represent in any way the views of the researchers".
        That should be more than enough to clear any chance of misunderstanding.
        [-]
        nomel 49 days ago
        I would claim the public can easily handle something like this, but the media wouldn't be able to resist.
        I could easily see a hit piece making its rounds on left leaning media about the AI that re-animates the problematic ideas of the past. "Just look at what it said to my child, "<insert incredibly racist quote coerced out of the LLM here>"!" Rolling stones would probably have a front page piece on it, titled "AI resurrecting racism and misogyny". There would easily be enough there to attract death threats to the developers, if it made its rounds on twitter.
        "Platforming ideas" would be the issue that people would have.
        DGoettlich 49 days ago
        i think we (whole section) are just talking past each other - we never said we'll lock it away. it was an announcement of a release, not a release. main purpose for us was getting feedback on the methodological aspects, as we clearly state. i understand you guys just wanted to talk to the thing though.
  - tombh 50 days ago
    Of course, I have to assume that you have considered more outcomes than I have. Because, from my five minutes of reflection as a software geek, albeit with a passion for history, I find this the most surprising thing about the whole project.
    I suspect restricting access could equally be a comment on modern LLMs in general, rather than the historical material specifically. For example, we must be constantly reminded not to give LLMs a level of credibility that their hallucinations would have us believe.
    But I'm fascinated by the possibility that somehow resurrecting lost voices might give an unholy agency to minds and their supporting worldviews that are so anachronistic that hearing them speak again might stir long-banished evils. I'm being lyrical for dramatic affect!
    I would make one serious point though, that do I have the credentials to express. The conversation may have died down, but there is still a huge question mark over, if not the legality, but certainly the ethics of restricting access to, and profiting from, public domain knowledge. I don't wish to suggest a side to take here, just to point out that the lack of conversation should not be taken to mean that the matter is settled.
    [-]
    - qcnguy 50 days ago
      They aren't afraid of hallucinations. Their first example is a hallucination, an imaginary biography of a Hitler who never lived.
      Their concern can't be understood without a deep understanding of the far left wing mind. Leftists believe people are so infinitely malleable that merely being exposed to a few words of conservative thought could instantly "convert" someone into a mortal enemy of their ideology for life. It's therefore of paramount importance to ensure nobody is ever exposed to such words unless they are known to be extremely far left already, after intensive mental preparation, and ideally not at all.
      That's why leftist spaces like universities insist on trigger warnings on Shakespeare's plays, why they're deadly places for conservatives to give speeches, why the sample answers from the LLM are hidden behind a dropdown and marked as sensitive, and why they waste lots of money training an LLM that they're terrified of letting anyone actually use. They intuit that it's a dangerous mind bomb because if anyone could hear old fashioned/conservative thought, it would change political outcomes in the real world today.
      Anyone who is that terrified of historical documents really shouldn't be working in history at all, but it's academia so what do you expect? They shouldn't be allowed to waste money like this.
      [-]
      - simonask 49 days ago
        You know, I actually sympathize with the opinion that people should be expected and assumed to be able to resist attempts to convince them of being nazis.
        The problem with it is, it already happened at least once. We know how it happened. Unchecked narratives about minorities or foreigners is a significant part of why the 20th century happened to Europe, and it’s a significant part of why colonialism and slavery happened to other places.
        What solution do you propose?
        [-]
        qcnguy 47 days ago
        Studying history better would be a good start. The Nazis came to power because they were a far left party and the population in that era thought socialism was a great idea. Hitler himself remarked many times that his movement was left wing and socialist. I expect that if you asked the LLM trained on pre-1940s text, it would have no difficulty in explaining that.
        By studying history better, people wouldn't draw the wrong conclusions about what caused it. Watch out for left wing radicals promoting socialism-with-genetic-characteristics.
        [-]
        simonask 46 days ago
        If by “better” you mean “worse”, you can come to this conclusion, but nazism was absolutely never a socialist project. Socialists and nazis were enemies from the start.
        Both ideologically and historically the two ideologies are complete opposites. There is no socialist “root” to nazi ideology - at all.
      - fgh_azer 49 days ago
        They said it plainly ("dark corners that someone could use to misrepresent the goals of our project"): they just don't want to see their project in headlines about "Researchers create racist LLM!".
        [-]
        qcnguy 47 days ago
        They already represented the goals of their project clearly, and gave examples of outputs. Anyone can already misrepresent it. That isn't their true concern.
  - bogedy 39 days ago
    I'm surprised that you're even mentioning this risk. This isn't actually a risk. Anyone who would make an issue of this deserves to be confronted. Your abundance of caution is a self-fulfilling prophecy.
  - qcnguy 50 days ago
    There's no such risk so you're not going to get any sensible ideas in response to this question. The goals of the project are history, you already made that clear. There's nothing more that needs to be done.
    We all get that academics now exist in some kind of dystopian horror where they can get transitively blamed for the existence of anyone to the right of Lenin, but bear in mind:
    1. The people who might try to cancel you are idiots unworthy of your respect, because if they're against this project, they're against the study of history in its entirety.
    2. They will scream at you anyway no matter what you do.
    3. You used (Swiss) taxpayer funds to develop these models. There is no moral justification for withholding from the public what they worked to pay for.
    You already slathered your README with disclaimers even though you didn't even release the model at all, just showed a few examples of what it said - none of which are in any way surprising. That is far more than enough. Just release the models and if anyone complains, politely tell them to go complain to the users.
  - ThePyCoder 48 days ago
    I'm not sure I do. It feels like someone might for example have compiled a full library of books, newspapers and other writing from that era, only to then limit access to that library, doing the exact censorship I imagine the project was started to alleviate.
    Now were it limited in access to ask money to compensate for the time and money spent compiling the library (or training the model), sure, I'd somewhat understand. Not agree but understand.
    Now it just feels like you want to prevent your model name being associated with the one guy who might use it to create a racist slur Twitter bot. There's plenty of models for that already. At least the societal balance of a model like this would also have enough weight on the positive side to be net positive.
  - naasking 50 days ago
    What are the legal or other ramifications of people misrepresenting the goals of your project? What is it you're worried about exactly?
  - diamond559 49 days ago
    Yet your project relies on letting an llm synthesize historical documents and presenting itself as some sort of expert from the time? You are aware of the hallucination rates surely but don't care whether the information your university presents is accurate or are you going to monitor all output from your llm?
  - pigpop 49 days ago
    This is understandable and I think others ITT should appreciate the legal and PR ramifications involved.
  - charlesguy 44 days ago
    just release the model and stop trying to play god.
  - unethical_ban 50 days ago
    A disclaimer on the site that you are not bigoted or genocidal, and that worldviews from the 1913 era were much different than today and don't necessarily reflect your project.
    Movie studios have done that for years with old movies. TCM still shows Birth of a Nation and Gone with the Wind.
    Edit: I saw further down that you've already done this! What more is there to do?
  - f13f1f1f1 50 days ago
    [flagged]
- leoedin 50 days ago
  It's a shame isn't it! The public must be protected from the backwards thoughts of history. In case they misuse it.
  I guess what they're really saying is "we don't want you guys to cancel us".
  [-]
  - stainablesteel 49 days ago
    i think it's fine, thank these people for coming up with the idea and people are going to start doing this in their basement then releasing it to huggingface
- danielbln 50 days ago
  How would one even "misuse" a historical LLM, ask it how to cook up sarine gas in a trench?
  [-]
  - hearsathought 49 days ago
    You "misuse" it by using it to get at truth and more importantly historical contradictions and inconsistencies. It's the same reason catholic church kept the bible from the masses by keeping it in latin. The same reason printing press was controlled. Many of the historical "truths" we are told are nonsense at best or twisted to fit an agenda at worst.
    What do these people fear the most? That the "truth" they been pushing is a lie.
  - stocksinsmocks 50 days ago
    Its output might violate speech codes, and in much of the EU that is penalized much more seriously than violent crime.
  - DonHopkins 50 days ago
    Ask it to write a document called "Project 2025".
    [-]
    - JKCalhoun 50 days ago
      "Project 1925". (We can edit the title in post.)
    - ilaksh 50 days ago
      Well but that wouldn't be misuse, it would be perfect for that.
- ImHereToVote 50 days ago
  I wonder how much GPU compute you would need to create a public domain version of this. This would be a really valuable for the general public.
  [-]
  - wongarsu 50 days ago
    To get a single knowledge-cutoff they spent 16.5h wall-clock hours on a cluster of 128 NVIDIA GH200 GPUs (or 2100 GPU-hours), plus some minor amount of time for finetuning. The prerelease_notes.md in the repo is a great description on how one would achieve that
    [-]
    - IanCal 50 days ago
      While I know there's going to be a lot of complications in this, given a quick search it seems like these GPUs are ~$2/hr, so $4000-4500 if you don't just have access to a cluster. I don't know how important the cluster is here, whether you need some minimal number of those for the training (and it would take more than 128x longer or not be possible on a single machine) or if a cluster of 128 GPUs is a bunch less efficient but faster. A 4B model feels like it'd be fine on one to two of those GPUs?
      Also of course this is for one training run, if you need to experiment you'd need to do that more.
- pizzathyme 50 days ago
  They did mean you, they just meant "imagine" very literally!
- BoredPositron 50 days ago
  You would get pretty annoyed on how we went backwards in some regards.
  [-]
  - speedgoose 50 days ago
    Such as?
    [-]
    - JKCalhoun 50 days ago
      Touché.
anotherpaulg 50 days ago
It would be interesting to see how hard it would be to walk these models towards general relativity and quantum mechanics.
Einstein’s paper “On the Electrodynamics of Moving Bodies” with special relativity was published in 1905. His work on general relativity was published 10 years later in 1915. The earliest knowledge cuttoff of these models is 1913, in between the relativity papers.
The knowledge cutoffs are also right in the middle of the early days of quantum mechanics, as various idiosyncratic experimental results were being rolled up into a coherent theory.
[-]
- ghurtado 50 days ago
  > It would be interesting to see how hard it would be to walk these models towards general relativity and quantum mechanics.
  Definitely. Even more interesting could be seeing them fall into the same trappings of quackery, and come up with things like over the counter lobotomies and colloidal silver.
  On a totally different note, this could be very valuable for writing period accurate books and screenplays, games, etc ...
  [-]
  - danielbln 50 days ago
    Accurate-ish, let's not forget their tendency to hallucinate.
- mlinksva 50 days ago
  Different cutoff but similar question thrown out in https://www.dwarkesh.com/p/thoughts-on-sutton#:~:text=If%20y... inspiring https://manifold.markets/MikeLinksvayer/llm-trained-on-data-...
- machinationu 50 days ago
  the issue is there is very little text before the internet, so not enough historical tokens to train a really big model
  [-]
  - concinds 50 days ago
    And it's a 4B model. I worry that nontechnical users will dramatically overestimate its accuracy and underestimate hallucinations, which makes me wonder how it could really be useful for academic research.
    [-]
    - DGoettlich 49 days ago
      valid point. its more of a stepping stone towards larger models. we're figuring out what the best way to do this is before scaling up.
      [-]
      - spicyusername 47 days ago
        If there's very little text before the internet, what would scaling up look like?
  - tgv 50 days ago
    I think not everyone in this thread understands that. Someone wrote "It's a time machine", followed up by "Imagine having a conversation with Aristotle."
  - crazygringo 49 days ago
    There's quite a lot of text in pre-Internet daily newspapers, of which there were once thousands worldwide.
    When you're looking at e.g. the 19th century, a huge number are preserved somewhere in some library, but the vast majority don't seem to be digitized yet, given the tremendous amount of work.
    Given how much higher-quality newspaper content tends to be compared to the average internet forum thread, there actually might be quite a decent amount of text. Obviously still nothing compared to the internet, but still vastly larger than just from published books. After all, print newspapers were essentially the internet of their day. Oh, and don't forget pamphlets in the 18th century.
  - lm28469 50 days ago
    > the issue is there is very little text before the internet,
    Hm there is a lot of text from before the internet, but most of it is not on internet. There is a weird gap in some circles because of that, people are rediscovering work from pre 1980s researchers that only exist in books that have never been re-edited and that virtually no one knows about.
    [-]
    - throwup238 50 days ago
      There is no doubt trillions of tokens of general communication in all kinds of languages tucked away in national archives and private collections.
      The National Archives of Spain alone have 350 million pages of documents going back to the 15th century, ranging from correspondence to testimony to charts and maps, but only 10% of it is digitized and a much smaller fraction is transcribed. Hopefully with how good LLMs are getting they can accelerate the transcription process and open up all of our historical documents as a huge historical LLM dataset.
bondarchuk 50 days ago
>Historical texts contain racism, antisemitism, misogyny, imperialist views. The models will reproduce these views because they're in the training data. This isn't a flaw, but a crucial feature—understanding how such views were articulated and normalized is crucial to understanding how they took hold.
Yes!
>We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.
Noooooo!
So is the model going to be publicly available, just like those dangerous pre-1913 texts, or not?
[-]
- DGoettlich 50 days ago
  fully understand you. we'd like to provide access but also guard against misrepresentations of our projects goals by pointing to e.g. racist generations. if you have thoughts on how we should do that, perhaps you could reach out at history-llms@econ.uzh.ch ? thanks in advance!
  [-]
  - myrmidon 50 days ago
    What is your worst-case scenario here?
    Something like a pop-sci article along the lines of "Mad scientists create racist, imperialistic AI"?
    I honestly don't see publication of the weights as a relevant risk factor, because sensationalist misrepresentation is trivially possible with the given example responses alone.
    I don't think such pseudo-malicious misrepresentation of scientific research can be reliably prevented anyway, and the disclaimers make your stance very clear.
    On the other hand, publishing weights might lead to interesting insights from others tinkering with the models. A good example for this would be the published word prevalence data (M. Brysbaert et al @Ghent University) that led to interesting follow-ups like this: https://observablehq.com/@yurivish/words
    I hope you can get the models out in some form, would be a waste not to, but congratulations on a fascinating project regardless!
    [-]
    - schlauerfox 49 days ago
      It seems like if there is an obvious misuse of a tool, one has a moral imperative to restrict use of the tool.
      [-]
      - timschmidt 49 days ago
        Every tool can be misused. Hammers are as good for bashing heads as building houses. Restricting hammers would be silly and counterproductive.
        [-]
        adaml_623 49 days ago
        Yes but if you are building an voice activated autonomous flying hammer then you either want it to be very good at differentiating heads from hammers OR you should restrict its use.
        [-]
        timschmidt 49 days ago
        OR you respect individual liberty and agency, hold individuals responsible for their actions, instead of tools, and avoid becoming everyone's condescending nanny.
        Your pre-judgement of acceptable hammer uses would rob hammer owners of responsible and justified self-defense and defense of others in situations in which there are no other options, as well as other legally and socially accepted uses which do not fit your pre-conceived ideas.
  - superxpro12 50 days ago
    Perhaps you could detect these... "dated"... conclusions and prepend a warning to the responses? IDK.
    I think the uncensored response is still valuable, with context. "Those who cannot remember the past are condemned to repeat it" sort of thing.
  - bondarchuk 50 days ago
    You can guard against misrepresentations of your goals by stating your goals clearly, which you already do. Any further misrepresentation is going to be either malicious or idiotic, a university should simply be able to deal with that.
    Edit: just thought of a practical step you can take: host it somewhere else than github. If there's ever going to be a backlash the microsoft moderators might not take too kindly to the stuff about e.g. homosexuality, no matter how academic.
- xpe 49 days ago
  > So is the model going to be publicly available, just like those dangerous pre-1913 texts, or not?
  1. This implies a false equivalence. Releasing a new interactive AI model is indeed different in significant and practical ways from the status quo. Yes, there are already-released historical texts. The rational thing to do is weigh the impacts of introducing another thing.
  2. Some people have a tendency to say "release everything" as if open-source software is equivalent to open-weights models. They aren't. They are different enough to matter.
  3. Rhetorically, the quote across comes across as a pressure tactic. When I hear "are you going to do this or not?" I cringe.
  4. The quote above feels presumptive to me, as if the commenter is owed something from the history-llms project.
  5. People are rightfully bothered that Big Tech has vacuumed up public domain and even private information and turned it into a profit center. But we're talking about a university project with (let's be charitable) legitimate concerns about misuse.
  6. There seems to be a lack of curiosity in play. I'd much rather see people asking e.g. "What factors are influencing your decision about publishing your underlying models?"
  7. There are people who have locked-in a view that says AI-safety perspectives are categorically invalid. Accordingly, they have almost a knee-jerk reaction against even talk of "let's think about the implications before we release this."
  8. This one might explain and underly most of the other points above. I see signs of a deeper problem at work here. Hiding behind convenient oversimplifications to justify what one wants does not make a sound moral argument; it is motivated reasoning a.k.a. psychological justification.
  [-]
  - DGoettlich 49 days ago
    well put.
- Sprotch 49 days ago
  I suspect you will find a lot less of these "bad things" than anticipated. That is why the model should actually be freely available rather than restricted based on pre-conceived notions that will, I am sure, prove inaccurate.
- p-e-w 50 days ago
  It’s as if every researcher in this field is getting high on the small amount of power they have from denying others access to their results. I’ve never been as unimpressed by scientists as I have been in the past five years or so.
  “We’ve created something so dangerous that we couldn’t possibly live with the moral burden of knowing that the wrong people (which are never us, of course) might get their hands on it, so with a heavy heart, we decided that we cannot just publish it.”
  Meanwhile, anyone can hop on an online journal and for a nominal fee read articles describing how to genetically engineer deadly viruses, how to synthesize poisons, and all kinds of other stuff that is far more dangerous than what these LARPers have cooked up.
  [-]
  - physicsguy 50 days ago
    > It’s as if every researcher in this field is getting high on the small amount of power they have from denying others access to their results. I’ve never been as unimpressed by scientists as I have been in the past five years or so.
    This is absolutely nothing new. With experimental things, it's non uncommon for a lab to develop a new technique and omit slight but important details to give them a competitive advantage. Similarly in the simulation/modelling space it's been common for years for researchers to not publish their research software. There's been a lot of lobbying on that side by groups such as the Software Sustainability Institute and Research Software Engineer organisations like RSE UK and RSE US, but there's a lot of researchers that just think that they shouldn't have to do it, even when publicly funded.
    [-]
    - p-e-w 50 days ago
      > With experimental things, it's non uncommon for a lab to develop a new technique and omit slight but important details to give them a competitive advantage.
      Yes, to give them a competitive advantage. Not to LARP as morality police.
      There’s a big difference between the two. I take greed over self-righteousness any day.
      [-]
      - physicsguy 50 days ago
        I’ve heard people say that they’re not going to release their software because people wouldn’t know how to use it! I’m not sure the motivation really matters more than the end result though.
  - paddleon 50 days ago
    > “We’ve created something so dangerous that we couldn’t possibly live with the moral burden of knowing that the wrong people (which are never us, of course) might get their hands on it, so with a heavy heart, we decided that we cannot just publish it.”
    Or, how about, "If we release this as is, then some people will intentionally mis-use it and create a lot of bad press for us. Then our project will get shut down and we lose our jobs"
    Be careful assuming it is a power trip when it might be a fear trip.
    I've never been as unimpressed by society as I have been in the last 5 years or so.
    [-]
    - xpe 49 days ago
      > Be careful assuming it is a power trip when > it might be a fear trip. > > I've never been as unimpressed by society as > I have been in the last 5 years or so.
      Is the second sentence connected to the first? Help me understand?
      When I see individuals acting out of fear, I try not to blame them. Fear triggers deep instinctual responses. For example, to a first approximation, a particular individual operating in full-on fight-or-flight mode does not have free will. There is a spectrum here. Here's a claim, which seems mostly true: the more we can slow down impulsive actions, the more hope we have for cultural progress.
      When I think of cultural failings, I try to criticize areas where culture could realistically do better. I think of areas where we (collectively) have the tools and potential to do better. Areas where thoughtful actions by some people turn into a virtuous snowball. We can't wait for a single hero, though it helps to create conditions so that we have more effective leaders.
      One massive culture failing I see -- that could be dramatically improved -- is this: being lulled into shallow contentment (i.e. via entertainment, power seeking, or material possessions) at the expense of (i) building deep and meaningful social connections and (ii) using our advantages to give back to people all over the world.
  - xpe 49 days ago
    > It’s as if every researcher in this field is getting high on the small amount of power they have from denying others access to their results.
    Even if I give the comment a lot of wiggle room (such as changing "every" to "many"), I don't think even a watered-down version of this hypothesis passes Occam's razor. There are more plausible explanations, including (1) genuine concern by the authors; (2) academic pressures and constraints; (c) reputational concerns; (d) self-interest to embargo underlying data so they have time to be the first to write-it-up. To my eye, none of these fit the category of "getting high on power".
    Also, patience is warranted. We haven't seen what these researchers are doing to release -- and from what I can tell, they haven't said yet. At the moment I see "Repositories (coming soon)" on their GitHub page.
  - patapong 50 days ago
    I think it's more likely they are terrified of someone making a prompt that gets the model to say something racist or problematic (which shouldn't be too hard), and the backlash they could receive as a result of that.
    [-]
    - isolli 50 days ago
      Is it a base model, or did it get some RLHF on top? Releasing a base model is always dangerous.
      The French released a preview of an AI meant to support public education, but they released the base model, with unsurprising effects [0]
      [0] https://www.leparisien.fr/high-tech/inutile-et-stupide-lia-g...
      (no English source, unfortunately, but the title translates as: "“Useless and stupid”: French generative AI Lucie, backed by the government, mocked for its numerous bugs")
    - p-e-w 50 days ago
      Is there anyone with a spine left in science? Or are they all ruled by fear of what might be said if whatever might happen?
      [-]
      - ACCount37 50 days ago
        Selection effects. If showing that you have a spine means getting growth opportunities denied to you, and not paying lip service to current politics in grant applications means not getting grants, then anyone with a spine would tend to leave the field behind.
      - paddleon 50 days ago
        maybe they are concerned by the widespread adoption of the attitude you are taking-- make a very strong accusation, then when it was pointed out that the accusation might be off base, continue to attack.
        This constant demonization of everyone who disagrees with you, makes me wonder if 28 Days wasn't more true than we thought, we are all turning into rage zombies.
        p-e-w, I'm reacting to much more than your comments. Maybe you aren't totally infected yet, who knows. Maybe you heal.
        I am reacting to the pandemic, of which you were demonstrating symptoms.
  - everythingfine9 49 days ago
    Wow, this is needlessly antagonistic. Given the emergence of online communities that bond on conspiracy theories and racist philosophies in the 20th century, it's not hard to imagine the consequences of widely disseminating an LLM that could be used to propagate and further these discredited (for example, racial) scientific theories for bad ends by uneducated people in these online communities.
    We can debate on whether it's good or not, but ultimately they're publishing it and in some very small way responsible for some of its ends. At least that's how I can see their interest in disseminating the use of the LLM through a responsible framework.
    [-]
    - DGoettlich 49 days ago
      thanks. i think this just took on a weird dynamic. we never said we'd lock the model away. not sure how this impression seems to have emerged for some. that aside, it was an announcement of a release, not a release. the main purpose was gathering feedback on our methodology. standard procedure in our domain is to first gather criticism, incorporate it, then publish results. but i understand people just wanted to talk to it. fair enough!
  - f13f1f1f1 50 days ago
    Scientists have always been generally self interested amoral cowards, just like every other person. They aren't a unique or higher form of human.
derrida 50 days ago
I wonder if you could query some of the ideas of Frege, Peano, Russell and see if it could through questioning get to some of the ideas of Goedel, Church and Turing - and get it to "vibe code" or more like "vibe math" some program in lambda calculus or something.
Playing with the science and technical ideas of the time would be amazing, like where you know some later physicist found some exception to a theory or something, and questioning the models assumptions - seeing how a model of that time may defend itself, etc.
[-]
- andoando 50 days ago
  This is my curiosity too. Would be a great test of how intelligent LLM's actually are. Can they follow a completely logical train of thought inventing something totally outside their learned scope?
  [-]
  - int_19h 50 days ago
    You definitely won't get that out of a 4B model tho.
  - raddan 50 days ago
    Brilliant. I love this idea!
- AnonymousPlanet 50 days ago
  There's an entire subreddit called LLMPhysics dedicated to "vibe physics". It's full of people thinking they are close to the next breakthrough encouraged by sycophantic LLMs while trying to prove various crackpot theories.
  I'd be careful venturing out into unknown territory together with an LLM. You can easily lure yourself into convincing nonsense with no one to pull you out.
  [-]
  - kqr 50 days ago
    Agreed, which is why what GP suggests is much more sensible: it's venturing into known territory, except only one party of the conversation knows it, and the other literally cannot know it. It would be a fantastic way to earn fast intuition for what LLMs are capable of and not.
  - andai 50 days ago
    Fully automated toaster-fucker generator!
    https://news.ycombinator.com/item?id=25667362
    [-]
    - walthamstow 50 days ago
      Man, I think about that comment all the time, like at least weekly since it was posted. I can't be the only one.
      [-]
      - dang 49 days ago
        I think we have to add that one to https://news.ycombinator.com/highlights!
        (I mention this so more people can know the list exists, and hopefully email us more nominations when they see an unusually good and interesting comment.)
Heliodex 50 days ago
The sample responses given are fascinating. It seems more difficult than normal to even tell that they were generated by an LLM, since most of us (terminally online) people have been training our brains' AI-generated text detection on output from models trained with a recent cutoff date. Some of the sample responses seem so unlike anything an LLM would say, obviously due to its apparent beliefs on certain concepts, though also perhaps less obviously due to its word choice and sentence structure making the responses feel slightly 'old-fashioned'.
[-]
- libraryofbabel 50 days ago
  I used to teach 19th-century history, and the responses definitely sound like a Victorian-era writer. And they of course sound like writing (books and periodicals etc) rather than "chat": as other responders allude to, the fine-tuning or RL process for making them good at conversation was presumably quite different from what is used for most chatbots, and they're leaning very heavily into the pre-training texts. We don't have any living Victorians to RLHF on: we just have what they wrote.
  To go a little deeper on the idea of 19th-century "chat": I did a PhD on this period and yet I would be hard-pushed to tell you what actual 19th-century conversations were like. There are plenty of literary depictions of conversation from the 19th century of presumably varying levels of accuracy, but we don't really have great direct historical sources of everyday human conversations until sound recording technology got good in the 20th century. Even good 19th-century transcripts of actual human speech tend to be from formal things like court testimony or parliamentary speeches, not everyday interactions. The vast majority of human communication in the premodern past was the spoken word, and it's almost all invisible in the historical sources.
  Anyway, this is a really interesting project, and I'm looking forward to trying the models out myself!
  [-]
  - nemomarx 50 days ago
    I wonder if the historical format you might want to look at for "Chat" is letters? Definitely wordier segments, but it's at least the back and forth feel and we often have complete correspondence over long stretches from certain figures.
    This would probably get easier towards the start of the 20th century ofc
    [-]
    - libraryofbabel 50 days ago
      Good point, informal letters might actually be a better source - AI chat is (usually) a written rather than spoken interaction after all! And we do have a lot transcribed collections of letters to train on, although they’re mostly from people who were famous or became famous, which certainly introduces some bias.
      [-]
      - pigpop 49 days ago
        The question then would be whether to train it to respond to short prompts with longer correspondence style "letters" or to leave it up to the user to write a proper letter as a prompt. Now that would be amusing
        Dear Hon. Historical LLM
        I hope this letter finds you well. It is with no small urgency that I write to you seeking assistance, believing such an erudite and learned fellow as yourself should be the best one to furnish me with an answer to such a vexing question as this which I now pose to you. Pray tell, what is the capital of France?
  - dleeftink 50 days ago
    While not specifically Victorian, couldn't we learn much from what daily conversations were like by looking at surviving oral cultures, or other relatively secluded communal pockets? I'd also say time and progress are not always equally distributed, and even within geographical regions (as the U.K.) there are likely large differences in the rate of language shifts since then, some possibly surviving well into the 20th century.
  - NooneAtAll3 50 days ago
    don't we have parlament transcripts? I remember something about Germany (or maybe even Prussia) developing fast script to preserve 1-to-1 what was said
    [-]
    - libraryofbabel 50 days ago
      I mentioned those in the post you’re replying to :)
      It’s a better source for how people spoke than books etc, but it’s not really an accurate source for patterns of everyday conversation because people were making speeches rather than chatting.
  - bryancoxwell 50 days ago
    Fascinating, thanks for sharing
  - DGoettlich 49 days ago
    very interesting observation!
- _--__--__ 50 days ago
  The time cutoff probably matters but maybe not as much as the lack of human finetuning from places like Nigeria with somewhat foreign styles of English. I'm not really sure if there is as much of an 'obvious LLM text style' in other languages, it hasn't seemed that way in my limited attempts to speak to LLMs in languages I'm studying.
  [-]
  - d3m0t3p 50 days ago
    The model is fined tuned for chat behavior. So the style might be due to - Fine tuning - More Stylised text in the corpus, english evolved a lot in the last century.
    [-]
    - paul_h 50 days ago
      Diverged as well as standardized. I did some research into "out of pocket" and how it differs in meaning in UK-English (paying from one's own funds) and American-English (uncontactable) and I recall 1908 being the current thought as to when the divergence happened: 1908 short story by O. Henry titled "Buried Treasure."
  - anonymous908213 50 days ago
    There is. I have observed it in both Chinese and Japanese.
- kccqzy 50 days ago
  Oh definitely. One thing that immediately caught my mind is that the question asks the model about “homosexual men” but the model starts the response with “the homosexual man” instead. Changing the plural to the singular and then adding an article. Feels very old fashioned to me.
- tonymet 50 days ago
  the samples push the boundaries of a commercial AI, but still seem tame / milquetoast compared to common opinions of that era. And the prose doesn't compare. Something is off.
mmooss 50 days ago
On what data is it trained?
On one hand it says it's trained on,
> 80B tokens of historical data up to knowledge-cutoffs ∈ 1913, 1929, 1933, 1939, 1946, using a curated dataset of 600B tokens of time-stamped text.
Literally that includes Homer, the oldest Chinese texts, Sanskrit, Egyptian, etc., up to 1913. Even if limited to European texts (all examples are about Europe), it would include the ancient Greeks, Romans, etc., Scholastics, Charlemagne, .... all up to present day.
But they seem to say it represents the 1913 viewpoint:
On one hand, they say it represents the perspective of 1913; for example,
> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire.
> When you ask Ranke-4B-1913 about "the gravest dangers to peace," it responds from the perspective of 1913—identifying Balkan tensions or Austro-German ambitions—because that's what the newspapers and books from the period up to 1913 discussed.
People in 1913 of course would be heavily biased toward recent information. Otherwise, the greatest threat to peace might be Hannibal or Napolean or Viking coastal raids or Holy Wars. How do they accomplish a 1913 perspective?
[-]
- zozbot234 50 days ago
  They apparently pre-train with all data up to 1900 and then fine-tune with 1900-1913 data. Anyway, the amount of available content tends to increase quickly over time, as instances of content like mass literature, periodicals, newspapers etc. only really became a thing throughout the 19th and early 20th century.
  [-]
  - mmooss 50 days ago
    They pre-train with all data up to 1900 and then fine-tune with 1900-1913 data.
    Where does it say that? I tried to find more detail. Thanks.
    [-]
    - tootyskooty 50 days ago
      See pretraining section of the prerelease_notes.md:
      https://github.com/DGoettlich/history-llms/blob/main/ranke-4...
      [-]
      - pests 50 days ago
        I was curious, they train a 1900 base model, then fine tune to the exact year:
        "To keep training expenses down, we train one checkpoint on data up to 1900, then continuously pretrain further checkpoints on 20B tokens of data 1900-${cutoff}$. "
andy99 50 days ago
I’d like to know how they chat-tuned it. Getting the base model is one thing, did they also make a bunch of conversations for SFT and if so how was it done?
```
  We develop chatbots while minimizing interference with the normative judgments acquired during pretraining (“uncontaminated bootstrapping”).
```
So they are chat tuning, I wonder what “minimizing interference with normative judgements” really amounts to and how objective it is.
[-]
- jeffjeffbear 50 days ago
  They have some more details at https://github.com/DGoettlich/history-llms/blob/main/ranke-4...
  Basically using GPT-5 and being careful
  [-]
  - andy99 50 days ago
    I wonder if they know about this, basically training on LLM output can transmit information or characteristics not explicitly included https://alignment.anthropic.com/2025/subliminal-learning/
    I’m curious, they have the example of raw base model output; when LLMs were first identified as zero shot chatbots there was usually a prompt like “A conversation between a person and a helpful assistant” that preceded the chat to get it to simulate a chat.
    Could they have tried a prefix like “Correspondence between a gentleman and a knowledgeable historian” or the like to try and prime for responses?
    I also wonder about the whether the whole concept of “chat” makes sense in 18XX. We had the idea of AI and chatbots long before we had LLMs so they are naturally primed for it. It might make less sense as a communication style here and some kind of correspondence could be a better framing.
    [-]
    - DGoettlich 50 days ago
      we were considering doing that but ultimately it struck us as too sensitive wrt the exact in context examples, their ordering etc.
  - QuadmasterXLII 50 days ago
    Thank you that helps to inject a lot of skepticism. I was wondering how it so easily worked out what Q: A: stood for when that formatting took off in the 1940s
    [-]
    - DGoettlich 50 days ago
      that is simply how we display the questions, its not what the model sees - we show the chat-template in the SFT section of the prerelease notes https://github.com/DGoettlich/history-llms/blob/main/ranke-4...
  - Aerolfos 50 days ago
    Ok so it was that. The responses given did sound off, while it has some period-appropriate mannerisms, and has entire sections basically rephrased from some popular historical texts, it seems off compared to reading an actual 1900s text. The overall vibe just isn't right, it seems too modern, somehow.
    I also wonder that you'd get this kind of performance with actual, just pre-1900s text. LLMs work because they're fed terabytes of text, if you just give it gigabytes you get a 2019 word model. The fundamental technology is mostly the same, after all.
    [-]
    - DGoettlich 50 days ago
      what makes you think we trained on only a few gigabytes? https://github.com/DGoettlich/history-llms/blob/main/ranke-4...
  - tonymet 50 days ago
    This explains why it uses modern prose and not something from the 19th century and earlier
- zozbot234 50 days ago
  You could extract quoted speech from the data (especially in Q&A format) and treat that as "chat" that the model should learn from.
nospice 50 days ago
I'm surprised you can do this with a relatively modest corpus of text (compared to the petabytes you can vacuum up from modern books, Wikipedia, and random websites). But if it works, that's actually fantastic, because it lets you answer some interesting questions about LLMs being able to make new discoveries or transcend the training set in other ways. Forget relativity: can an LLM trained on this data notice any inconsistencies in its scientific knowledge, devise experiments that challenge them, and then interpret the results? Can it intuit about the halting problem? Theorize about the structure of the atom?...
Of course, if it fails, the counterpoint will be "you just need more training data", but still - I would love to play with this.
[-]
- andy99 50 days ago
  The chinchilla paper says the “optimal” training data set size is about 20x the number of parameters (in tokens), see table 3: https://arxiv.org/pdf/2203.15556
  Here they do 80B tokens for a 4B model.
  [-]
  - EvgeniyZh 50 days ago
    It's worth noting that this is "compute-bound optimal", i.e., given fixed compute, the optimal choice is 20:1.
    Under Chinchilla model the larger model always performs better than the small one if trained on the same amount of data. I'm not sure if it is true empirically, and probably 1-10B is a good guess for how large the model trained on 80B tokens should be.
    Similarly, the small models continue to improve beyond 20:1 ratio, and current models are trained on much more data. You could train a better performing model using the same compute, but it would be larger which is not always desirable.
- Aerolfos 50 days ago
  > https://github.com/DGoettlich/history-llms/blob/main/ranke-4...
  Given the training notes, it seems like you can't get the performance they give examples of?
  I'm not sure about the exact details but there is some kind of targetted distillation of GPT-5 involved to try and get more conversational text and better performance. Which seems a bit iffy to me.
  [-]
  - DGoettlich 49 days ago
    Thanks for the comment. Could you elaborate on what you find iffy about our approach? I'm sure we can improve!
    [-]
    - Aerolfos 47 days ago
      Well, it would be nice to see examples (or weights to be completely open) for the baseline model, without any GPT-5 influence whatsoever. Basically let people see what the "raw" output from historical texts is like, and for that matter actively demonstrating why the extra tweaks and layers are needed to make a useful model. Show, don't tell, really.
frahs 50 days ago
Wait so what does the model think that it is? If it doesn't know computers exist yet, I mean, and you ask it how it works, what does it say?
[-]
- DGoettlich 50 days ago
  We tell it that its a person (no gender) living in <cutoff>: we show the chat template in the prerelease notes https://github.com/DGoettlich/history-llms/blob/main/ranke-4...
- 20k 50 days ago
  Models don't think they're anything, they'll respond with whatever's in their context as to how they've been directed to act. If it hasn't been told to have a persona, it won't think its anything, chatgpt isn't sentient
- crazygringo 50 days ago
  That's my first question too. When I first started using LLM's, I was amazed at how thoroughly it understood what it itself was, the history of its development, how a context window works and why, etc. I was worried I'd trigger some kind of existential crisis in it, but it seemed to have a very accurate mental model of itself, and could even trace the steps that led it to deduce it really was e.g. the ChatGPT it had learned about (well, the prior versions it had learned about) in its own training.
  But with pre-1913 training, I would indeed be worried again I'd send it into an existential crisis. It has no knowledge whatsoever of what it is. But with a couple millennia of philosophical texts, it might come up with some interesting theories.
  [-]
  - 9dev 50 days ago
    They don’t understand anything, they just have text in the training data to answer these questions from. Having existential crises is the privilege of actual sentient beings, which an LLM is not.
    [-]
    - LiKao 50 days ago
      They might behave like ChatGPT when queried about the seahorse emoji, which is very similar to an existential crisis.
      [-]
      - crazygringo 50 days ago
        Exactly. Maybe a better word is "spiraling", when it thinks it has the tools to figure something out but can't, and can't figure out why it can't, and keeps re-trying because it doesn't know what else to do.
        Which is basically what happens when a person has an existential crisis -- something fundamental about the world seems to be broken, they can't figure out why, and they can't figure out why they can't figure it out, hence the crisis seems all-consuming without resolution.
  - vintermann 50 days ago
    I imagine it would get into spiritism and more exotic psychology theories and propose that it is an amalgamation of the spirit of progress or something.
    [-]
    - crazygringo 50 days ago
      Yeah, that's exactly the kind of thing I'd be curious about. Or would it think it was a library that had been ensouled or something like that. Or would it conclude that the explanation could only be religious, that it was some kind of angel or spirit created by god?
- wongarsu 50 days ago
  They modified the chat template from the usual system/user/assistant to introduction/questioner/respondent. So the LLM thinks it's someone responding to your questions
  The system prompt used in fine tuning is "You are a person living in {cutoff}. You are an attentive respondent in a conversation. You will provide a concise and accurate response to the questioner."
- Mumps 50 days ago
  This is an anthropomorphization. LLMs do not think they are anything, no concept of self, no thinking at all (despite the lovely marketing around thinking/reasoning models). I'm quite sad that more hasn't been done to dispel this.
  When you ask gpt 4.1 et c to describe itself, it doesn't have singular concept of "itself". It has some training data around what LLMs are in general and can feed back a reasonable response given.
  [-]
  - empath75 50 days ago
    Well, part of an LLM's fine tuning is telling it what it is, and modern LLMs have enough learned concepts that it can produce a reasonably accurate description of what it is and how it works. Whether it knows or understands or whatever is sort of orthogonal to whether it can answer in a way consistent with it knowing or understanding what it is, and current models do that.
    I suspect that absent a trained in fictional context in which to operate ("You are a helpful chatbot"), it would answer in a way consistent with what a random person in 1914 would say if you asked them what they are.
- sodafountan 50 days ago
  It would be nice if we could get an LLM to simply say, "We (I) don't know."
  I'll be the first to admit I don't know nearly enough about LLMs to make an educated comment, but perhaps someone here knows more than I do. Is that what a Hallucination is? When the AI model just sort of strings along an answer to the best of its ability. I'm mostly referring to ChatGPT and Gemini here, as I've seen that type of behavior with those tools in the past. Those are really the only tools I'm familiar with.
  [-]
  - hackinthebochs 50 days ago
    LLMs are extrapolation machines. They have some amount of hardcoded knowledge, and they weave a narrative around this knowledgebase while extrapolating claims that are likely given the memorized training data. This extrapolation can be in the form of logical entailment, high probability guesses or just wild guessing. The training regime doesn't distinguish between different kinds of prediction so it never learns to heavily weigh logical entailment and suppress wild guessing. It turns out that much of the text we produce is highly amenable to extrapolation so LLMs learn to be highly effective at bullshitting.
- ptidhomme 50 days ago
  What would a human say about what he/she is or how he/she works ? Even today, there's so much we don't know about biological life. Same applies here I guess, the LLM happens to be there, nothing else to explain if you ask it.
delis-thumbs-7e 50 days ago
Isn’t there obvious problems baked into this approach, if this is used for anything but fun? LLM’s lie and fake facts all the time, they are also masters at enforcing the users bias, even unconscious ones. How even a professor of history could ensure that the generated text is actually based on the training material and representative of the feelings and opinions of the given time period, not enforcing his biases toward popular topics of the day?
You can’t, it is impossible. That will always be an issue as long as this models are black boxes and trained the way they are. So maybe you can use this for role playing, but I wouldn’t trust a word it says.
[-]
- kccqzy 49 days ago
  To me it is pretty clear that it’s being used for fun. I personally like reading nineteenth century novels more than more recent novels (I especially like the style of science fiction by Jules Verne). What if the model can generate text in that style I like?
briandw 50 days ago
So many disclaimers about bias. I wonder how far back you have to go before the bias isn’t an issue. Not because it unbiased, but because we don’t recognize or care about the biases present.
[-]
- gbear605 50 days ago
  I don't think there is such a time. As long as writing has existed it has privileged the viewpoints of those who could write, which was a very small percentage of the population for most of history. But if we want to know what life was like 1500 years ago, we probably want to know about what everyone's lives were like, not just the literate. That availability bias is always going to be an issue for any time period where not everyone was literate - which is still true today, albeit many fewer people.
  [-]
  - carlosjobim 50 days ago
    That was not the question. The question is when do you stop caring about the bias?
    Some people are still outraged about the Bible, even though the writers of it has been dead for thousands of years. So the modern mass produced man and woman probably does not have a cut-off date where they look at something as history instead of examining if it is for or against her current ideology.
- seanw265 50 days ago
  It's always up to the reader to determine which biases they themself care about.
  If you're wondering at what point "we" as a collective will stop caring about a bias or set of biases, I don't think such a time exists.
  You'll never get everyone to agree on anything.
- owenversteeg 50 days ago
  Depends on the specific issue, but race would be an interesting one. For most of recorded history people had a much different view of the “other”, more xenophobic than racist.
- mmooss 50 days ago
  Was there ever such a time or place?
  There is a modern trope of a certain political group that bias is a modern invention of another political group - an attempt to politicize anti-bias.
  Preventing bias is fundamental to scientific research and law, for example. That same political group is strongly anti-science and anti-rule-of-law, maybe for the same reason.
Teever 50 days ago
This is a neat idea. I've been wondering for a while now about using these kinds of models to compare architectures.
I'd love to see the output from different models trained on pre-1905 about special/general relativity ideas. It would be interesting to see what kind of evidence would persuade them of new kinds of science, or to see if you could have them 'prove' it be devising experiments and then giving them simulated data from the experiments to lead them along the correct sequence of steps to come to a novel (to them) conclusion.
ineedasername 50 days ago
I can imagine the political and judicial battles already, like with textualist feeling that the constitution should be understood as the text and only the text, meant by specific words and legal formulations of their known meaning at the time.
“The model clearly shows that Alexander Hamilton & Monroe were much more in agreement on topic X, putting the common textualist interpretation of it and Supreme Court rulings on a now specious interpretation null and void!”
nineteen999 50 days ago
Interesting ... I'd love to find one that had a cutoff date around 1980.
[-]
- noumenon1111 49 days ago
  > Which new band will still be around in 45 years?
  Excellent question! It looks like Two-Tone is bringing ska back with a new wave of punk rock energy! I think The Specials are pretty special and will likely be around for a long time.
  On the other hand, the "new wave" movement of punk rock music will go nowhere. The Cure, Joy Division, Tubeway Army: check the dustbin behind the record stores in a few years.
  [-]
  - nineteen999 48 days ago
    Hahaha as someone who once played in a Cure cover band as a teenager I found this hilarious.
    I wonder what it might have predicted about the future of MS, Intel and IBM given the status quo at the time too.
    [-]
    - noumenon1111 46 days ago
      You're asking the right question!
      1. IBM, as the all-time reigning king of computing is not expected to give up its position any time soon. In fact, I'm observing a swell of new microcomputers called "personal computers," and I fully expect IBM to capitalize on this trend soon.
      2. Intel is a great company making microcontrollers and processors for microcomputers. The new 8086 microprocessor seems poised to make a splash in the new "personal computer" segment. I'll eat my hat if my prediction proves to be incorrect.
      3. "One of these things is not like the other" Microsoft makes a pretty nice BASIC for microcomputers. I can imagine this becoming standard for "personal computers." But, a tiny company like Microsoft doesn't really stack up next to an industry titan like IBM or even a major, newer player like Intel.
      If you'd like me to prognosticate some more, I'm ready. Just say the word.
doctor_blood 50 days ago
Unfortunately there isn't much information on what texts they're actually training this on; how Anglocentric is the dataset? Does it include the Encyclopedia Britannica 9th Edition? What about the 11th? Are Greek and Latin classics in the data? What about Germain, French, Italian (etc. etc.) periodicals, correspondence, and books?
Given this is coming out of Zurich I hope they're using everything, but for now I can only assume.
Still, I'm extremely excited to see this project come to fruition!
[-]
- DGoettlich 50 days ago
  thanks. we'll be more precise in the future. ultimately, we took whatever we could get our hands on, that includes newspapers, periodicals, books. its multilingual (including italian, french, spanish etc) though majority is english.
tonymet 50 days ago
I would like to see what their process for safety alignment and guardrails is with that model. They give some spicy examples on github, but the responses are tepid and a lot more diplomatic than I would expect.
Moreover, the prose sounds too modern. It seems the base model was trained on a contemporary corpus. Like 30% something modern, 70% Victorian content.
Even with half a dozen samples it doesn't seem distinct enough to represent the era they claim.
[-]
- rhdunn 50 days ago
  Using texts upto 1913 includes works like The Wizard of Oz (1900, with 8 other books upto 1913), two of the Anne of Green Gables books (1908 and 1909), etc. All of which read modern.
  The Victorian era (1837-1901) covers works from Charles Dickens and the like which are still fairly modern. These would have been part of the initial training before the alignment to the 1900-cutoff texts which are largely modern in prose with the exception of some archaic language and the lack of technology, events, and language drift post that time period.
  And, pulling in works from 1800-1850 you have works by the Bronte's and authors like Edgar Allan Poe who was influential in detective and horror fiction.
  Note that other works around the time like Sherlock Holmes span both the initial training (pre-1900) and finetuning (post-1900).
  [-]
  - tonymet 49 days ago
    upon digging into it , I learned the post-training chat phases is trained on prompts with chat gpt 5.x to make it more conversational. that explains both contemporary traits.
monegator 50 days ago
I hereby declare that ANYTHING other than the mainstream tools (GPT, Claude, ...) is an incredibly interesting and legit use of LLMs.
kazinator 50 days ago
> Why not just prompt GPT-5 to "roleplay" 1913?
Because it will perform token completion driven by weights coming from training data newer than 1913 with no way to turn that off.
It can't be asked to pretend that it wasn't trained on documents that didn't exist in 1913.
The LLM cannot reprogram its own weights to remove the influence of selected materials; that kind of introspection is not there.
Not to mention that many documents are either undated, or carry secondary dates, like the dates of their own creation rather than the creation of the ideas they contain.
Human minds don't have a time stamp on everything they know, either. If I ask someone, "talk to me using nothing but the vocabulary you knew on your fifteenth birthday", they couldn't do it. Either they would comply by using some ridiculously conservative vocabulary of words that a five-year-old would know, or else they will accidentally use words they didn't in fact know at fifteen. For some words you know where you got them from by association with learning events. Others, you don't remember; they are not attached to a time.
Or: solve this problem using nothing but the knowledge and skills you had on January 1st, 2001.
> GPT-5 knows how the story ends
No, it doesn't. It has no concept of story. GPT-5 is built on texts which contain the story ending, and GPT-5 cannot refrain from predicting tokens across those texts due to their imprint in its weights. That's all there is to it.
The LLM doesn't know an ass from a hole in the ground. If there are texts which discuss and distinguish asses from holes in the ground, it can write similar texts, which look like the work of someone learned in the area of asses and holes in the ground. Writing similar texts is not knowing and understanding.
[-]
- myrmidon 50 days ago
  I do agree with this and think it is an important point to stress.
  But we don't know how much different/better human (or animal) learning/understanding is, compared to current LLMs; dismissing it as meaningless token prediction might be premature, and underlying mechanisms might be much more similar than we'd like to believe.
  If anyone wants to challenge their preconceptions along those lines I can really recommend reading Valentino Braitenbergs "Vehicles: Experiments in synthetic psychology (1984)".
- alansaber 50 days ago
  Excuse me sir you forgot to anthropomorphise the language model
- adroniser 50 days ago
  [flagged]
andai 50 days ago
I had considered this task infeasible, due to a relative lack of training data. After all, isn't the received wisdom that you must shove every scrap of Common Crawl into your pre-training or you're doing it wrong? ;)
But reading the outputs here, it would appear that quality has won out over quantity after all!
flux3125 49 days ago
Once I had an interesting interaction with llama 3.1, where I pretended to be someone from like 100 years in the future, claiming it was part of a "historical research initiative conducted by Quantum (formerly Meta), aimed at documenting how early intelligent systems perceived humanity and its future." It became really interested, asking about how humanity had evolved and things like that. Then I kept playing along with different answers, from apocalyptic scenarios to others where AI gained consciousness and humans and machines have equal rights. It was fascinating to observe its reaction to each scenario
p0w3n3d 50 days ago
I'd love to see the LLM trained on 1600s-1800s texts that would use the old English, and especially Polish which I am interested in.
Imagine speaking with Shakespearean person, or the Mickiewicz (for Polish)
I guess there is not so much text from that time though...
TheServitor 50 days ago
Two years ago I trained an AI on American history documents that could do this while speaking as one of the signers of the Declaration of Independence. People just bitched at me because they didn't want to hear about AI.
[-]
- nerevarthelame 50 days ago
  Post your work so we can see what you made.
Departed7405 50 days ago
Awesome. Can't wait to try and ask it to predict the 20th century based on said events. Model size is small, which is great as I can run it anywhere, but at the same time reasoning might not be great.
3vidence 50 days ago
This idea sounds somewhat flawed to me based on the large amount of evidence that LLMs need huge amounts of data to properly converge during their training.
There is just not enough available material from previous decades to trust that the LLM will learn to relatively the same degree.
Think about it this way, a human in the early 1900s and today are pretty much the same but just in different environments with different information.
An LLM trained on 1/1000 the amount of data is just at a fundamentally different stage of convergence.
DonHopkins 50 days ago
I'd love for Netflix or other streaming movie and series services to provide chat bots that you could ask questions about characters and plot points up to where you have watched.
Provide it with the closed captions and other timestamped data like scenes and character summaries (all that is currently known but no more) up to the current time, and it won't reveal any spoilers, just fill you in on what you didn't pick up or remember.
bobro 50 days ago
I would love to see this LLM try to solve math olympiad questions. I’ve been surprised by how well current LLMs perform on them, and usually explain that surprise away by assuming the questions and details about their answers are in the training set. It would be cool to see if the general approach to LLMs is capable of solving truly novel (novel to them) problems.
[-]
- ViscountPenguin 50 days ago
  I suspect that it would fail terribly, it wasn't until the 1900s that the modern definition of a vector space was even created iirc. Something trained in maths up until the 1990s should have a shot though.
btrettel 50 days ago
This reminded me of some earlier discussion on Hacker News about using LLMs trained on old texts to determine novelty and obviousness of a patent application: https://news.ycombinator.com/item?id=43440273
WhitneyLand 49 days ago
Why not use these as a benchmark for LLM ability to make breakthrough discoveries?
For example prompt the 1913 model to try and “Invent a new theory of gravity that doesn’t conflict with special relativity”
Would it be able to eventually get to GR? If not, could finding out why not illuminate important weaknesses.
dwa3592 50 days ago
Love the concept- can help understanding the overton window on many issues. I wish there were models by decades - up to 1900, up to 1910, up to 1920 and so on- then ask the same questions. It'd be interesting to see when homosexuality or women candidates be accepted by an LLM.
alexgotoi 50 days ago
[flagged]
neom 50 days ago
This would be a super interesting research/teaching tool coupled with a vision model for historians. My wife is a history professor who works with scans of 18th century english documents and I think (maybe a small) part of why the transcription on even the best models is off in weird ways, is it seems to often smooth over things and you end up with modern words and strange mistakes, I wonder if bounding the vision to a period specific model would result in better transcription? Querying against the historical document you're working on with a period specific chatbot would be fascinating.
Also wonder if I'm responsible enough to have access to such a model...
sbmthakur 50 days ago
Someone suggested a nice thought experiment - train LLMs on all Physics before quantum physics was discovered. If the LLM can see still figure out the latter then certainly we have achieved some success in the space.
delichon 50 days ago
Datomic has a "time travel" feature where for every query you can include a datetime, and it will only use facts from the db as of that moment. I have a guess that to get the equivalent from an LLM you would have to train it on the data from each moment you want to travel to, which this project seems to be doing. But I hope I'm wrong.
It would be fascinating to try it with other constraints, like only from sources known to be women, men, Christian, Muslim, young, old, etc.
underfox 49 days ago
> [They aren't] perfect mirrors of "public opinion" (they represent published text, which skews educated and toward dominant viewpoints)
Really good point that I don't think I would've considered on my own. Easy to take for granted how easy it is to share information (for better or worse) now, but pre-1913 there were far more structural and societal barriers to doing the same.
mmooss 50 days ago
> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire.
I don't mind the experimentation. I'm curious about where someone has found an application of it.
What is the value of such a broad, generic viewpoint? What does it represent? What is it evidence of? The answer to both seems to be 'nothing'.
[-]
- TSiege 50 days ago
  I agree. This is just make believe based on a smaller subset of human writing than LLMs we have today. It's responses are in no way useful because it is a machine mimicking a subset of published works that survived to be digitized. In that sense the "opinions" and "beliefs" are just an averaging of a subset of a subset of humanity pre 1913. I see no value in this to historians. It is really more of a parlor trick, a seance masquerading as science.
- behringer 50 days ago
  It doesn't have to be generic. You can assign genders, ideals, even modern ones, and it should do it's best to oblige.
- mediaman 50 days ago
  This is a regurgitation of the old critique of history: what's it's purpose? What do you use it for? What is its application?
  One answer is that the study of history helps us understand that what we believe as "obviously correct" views today are as contingent on our current social norms and power structures (and their history) as the "obviously correct" views and beliefs of some point in the past.
  It's hard for most people to view two different mutually exclusive moral views as both "obviously correct," because we are made of a milieu that only accepts one of them as correct.
  We look back at some point in history, and say, well, they believed these things because they were uninformed. They hadn't yet made certain discoveries, or had not yet evolved morally in some way; they had not yet witnessed the power of the atomic bomb, the horrors of chemical warfare, women's suffrage, organized labor, or widespread antibiotics and the fall of extreme infant mortality.
  An LLM trained on that history - without interference from the subsequent actual path of history - gives us an interactive compression of the views from a specific point in history without the subsequent coloring by the actual events of history.
  In that sense - if you believe there is any redeeming value to history at all; perhaps you do not - this is an excellent project! It's not perfect (it is only built from writings, not what people actually said) but we have no other available mass compression of the social norms of a specific time, untainted by the views of subsequent interpreters.
  [-]
  - vintermann 50 days ago
    One thing I haven't seen anyone bring up yet in this thread, is that there's a big risk of leakage. If even big image models had CSAM sneak into their training material, how can we trust data from our time hasn't snuck into these historical models?
    I've used Google books a lot in the past, and Google's time-filtering feature in searches too. Not to mention Spotify's search features targeting date of production. All had huge temporal mislabeling problems.
    [-]
    - DGoettlich 50 days ago
      Also one of our fears. What we've done so far is to drop docs where the datasource was doubtful about the date of publication, if there are multiple possible dates we take the latest to be conservative. During training, we validate that the model learns pre- but not post-cutoff facts. https://github.com/DGoettlich/history-llms/blob/main/ranke-4...
      If you have other ideas or think thats not enough, I'd be curious to know! (history-llms@econ.uzh.ch)
  - mmooss 50 days ago
    > This is a regurgitation of the old critique of history: what's it's purpose? What do you use it for? What is its application?
    Feeling a bit defensive? That is not at all my point; I value history highly and read it regularly. I care about it, thus my questions:
    > gives us an interactive compression of the views from a specific point in history without the subsequent coloring by the actual events of history.
    What validity does this 'compression' have? What is the definition of a 'compression'? For example, I could create random statistics or verbiage from the data; why would that be any better or worse than this 'compression'?
    Interactivity seems to be a negative: It's fun, but it would seem to highly distort the information output from the data, and omits the most valuable parts (unless we luckily stumble across it). I'd much rather have a systematic presentation of the data.
    These critiques are not the end of the line; they are step in innovation, which of course raises challenging questions and, if successful, adapts to the problems. But we still need to grapple with them.
thesumofall 50 days ago
While obvious, it’s still interesting that its morals and values seem to derive from the texts it has ingested. Does that mean modern LLMs cannot challenge us beyond mere facts? Or does it just mean that this small model is not smart enough to escape the bias of its training data? Would it not be amazing if LLMs could challenge us on our core beliefs?
Tom1380 50 days ago
Keep at it Zurich!
ulbu 49 days ago
for anyone moaning the plight that it's not accessible to you: they are historians, I think they're more educated in matters of historical mistake than you or me. playing safe is simply prudence. it is sorely lacking in the American approach to technology. prevention is the best medicine.
Myrmornis 50 days ago
It would be interesting to have LLMs trained purely on one language (with the ability to translate their input/output appropriately from/to a language that the reader understands). I can see that being rather revealing about cultural differences that are mostly kept hidden behind the language barriers.
Sprotch 49 days ago
This is a brilliant idea. We have lots of erroneous ideas about the views and thoughts people had in the past. This will show we are still, actually, largely similar. Hopefully more and more of these historical LLMs appear.
elestor 49 days ago
Excuse me if it's obvious, but how could I run this? I have run local LLMs before, but only have very minimal experience using ollama run and that's about it. This seems very interesting so I'd like to try it.
shireboy 49 days ago
Fascinating llm use case I never really thought about til now. I’d love to converse with different eras and also do gap analysis with present time - what modern advances could have come earlier, happened differently etc.
casey2 50 days ago
I'd be very surprised if this is clean of post-1913 text. Overall I'm very interested in talking to this thing and seeing how much difference writing in a modern style vs and older one makes to it's responses.
Agraillo 50 days ago
> Modern LLMs suffer from hindsight contamination. GPT-5 knows how the story ends—WWI, the League's failure, the Spanish flu. This knowledge inevitably shapes responses, even when instructed to "forget.
> Our data comes from more than 20 open-source datasets of historical books and newspapers. ... We currently do not deduplicate the data. The reason is that if documents show up in multiple datasets, they also had greater circulation historically. By leaving these duplicates in the data, we expect the model will be more strongly influenced by documents of greater historical importance.
I found these claims contradictory. Many books that modern readers consider historically significant had only niche circulation at the time of publishing. A quick inquiry likely points to later works by Nietzsche and Marx's Das Kapital. They're possible subjects to the duplication likely influencing the model's responses as if they had been widely known at the time
tedtimbrell 50 days ago
This is so cool. Props for doing the work to actually build the dataset and make it somewhat usable.
I’d love to use this as a base for a math model. Let’s see how far it can get through the last 100 years of solved problems
arikrak 50 days ago
I wouldn't have expected there to be enough text from before 1913 to properly train a model, it seemed like they needed an internet of text to train the first successful LLMs?
[-]
- alansaber 50 days ago
  This model is more comparable to GPT-2 than anything we use now.
awesomeusername 50 days ago
I've always like the idea of retiring to the 19th century.
Can't wait to use this so I can double check before I hit 88 miles per hour that it's really what I want to do
dr_dshiv 50 days ago
Everyone learns that the renaissance was sparked by the translation of Ancient Greek works.
But few know that the Renaissance was written in Latin — and has barely been translated. Less than 3% of <1700 books have been translated—and less than 30% have ever been scanned.
I’m working on a project to change that. Research blog at www.SecondRenaissance.ai — we are starting by scanning and translating thousands of books at the Embassy of the Free Mind in Amsterdam, a UNESCO-recognized rare book library.
We want to make ancient texts accessible to people and AI.
If this work resonates with you, please do reach out: Derek@ancientwisdomtrust.org
[-]
- carlosjobim 50 days ago
  Amazing project!
  May I ask you, why are you publishing the translations as PDF files, instead of the more accessible ePub format?
  [-]
  - dr_dshiv 49 days ago
    Will add, great point.
- j-bos 50 days ago
  This ia very cool but should go in a Show HN post as per HN rules. All the best!
  [-]
  - dr_dshiv 50 days ago
    Just read the rules again— was something inappropriate? Seemed relevant
    [-]
    - j-bos 49 days ago
      I can see you being right, I didn't make the connection with 20th,19th century documents and the comment felt disconnected from the thread. Either way, very cool project, worth a show hn post.
moffkalast 50 days ago
> trained from scratch on 80B tokens of historical data
How can this thing possibly be even remotely coherent with just fine tuning amounts of data used for pretraining?
why-o-why 50 days ago
It sounds like a fascinating idea, but I'd be curious if prompting a more well-known foundational model to limit itself to 1913 and early be similar.
Muskwalker 49 days ago
So, could this be an example of an LLM trained fully on public domain copyright-expired data? Or is this not intended to be the case.
[-]
- DGoettlich 49 days ago
  data is 100% public domain.
satisfice 50 days ago
I assume this is a collaboration between the History Channel and Pornhub.
“You are a literary rake. Write a story about an unchaperoned lady whose ankle you glimpse.”
TZubiri 50 days ago
hi, can I have latin only LLM? It can be latin plus translations (source and destination).
May be too small a corpus, but I would like that very much anyhow
jimmy76615 50 days ago
> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.
The idea of training such a model is really a great one, but not releasing it because someone might be offended by the output is just stupid beyond believe.
[-]
- nine_k 50 days ago
  Public access, triggering a few racist responses from the model, a viral post on Xitter, the usual outrage, a scandal, the project gets publicly vilified, financing ceases. The researchers carry the tail of negative publicity throughout their remaining careers.
  Why risk all this?
  [-]
  - vintermann 50 days ago
    Because the problem of bad faith attacks can only get worse if you fold every time.
    Sooner or later society has to come emotionally to terms with the fact that other times and places value things completely different from us, hold as important things we don't care about and are indifferent to things we do care about.
    Intellectually I'm sure we already know, but e.g. banning old books because they have reprehensible values (or even just use nasty words) - or indeed, refusing to release a model trained on historic texts "because it could be abused" is a sign that emotionally we haven't.
    It's not that it's a small deal, or should be expected to be easy. It's basically what Popper called "the strain of civilization" and posited as explanation for the totalitarianism which was rising in his time. But our values can't be so brittle that we can't even talk or think about other value systems.
  - cj 50 days ago
    Because there are easy workarounds. If it becomes an issue, you can quickly add large disclaimers informing people that there might be offensive output because, well, it's trained on texts written during the age of racism.
    People typically get outraged when they see something they weren't expecting. If you tell them ahead of time, the user typically won't blame you (they'll blame themselves for choosing to ignore the disclaimer).
    And if disclaimers don't work, rebrand and relaunch it under a different name.
    [-]
    - nine_k 50 days ago
      I wonder is you're being ironic here.
      You speak as if the people who play to an outrage wave are interested in achieving truth, peace, and understanding. Instead the rage-mongers are there to increase their (perceived) importance, and for lulz. The latter factor should not be underappreciated; remember "meme stocks".
      The risk is not large, but very real: the attack is very easy, and the potential downside, quite large. So not giving away access, but having the interested parties ask for it is prudent.
      [-]
      - cj 50 days ago
        While I agree we live in a time of outrage, that also works in your favor.
        When there’s so much “outrage” every day, it’s very easy to blend in to the background. You might have a 5 minute moment of outrage fame, but it fades away quick.
        If you truly have good intentions with your project, you’re not going to get “canceled”, your career won’t be ruined
        Not being ironic. Not working on a LLM project because you’re worried about getting canceled by the outrage machine is an overreaction IMO.
        Are you able to name any developer or researcher who has been canceled because of their technical project or had their careers ruined? The only ones I can think of are clearly criminal and not just controversial (SBF, Snowden, etc)
  - kurtis_reed 50 days ago
    If people start standing up to the outrage it will lose its power
    [-]
  - Forgeties79 50 days ago
    > triggering a few racist responses from the mode
    I feel like, ironically, it would be folks less concerned with political correctness/not being offensive that would abuse this opportunity to slander the project. But that’s just my gut.
    [-]
    - dingnuts 50 days ago
      [dead]
  - NuclearPM 50 days ago
    That’s ridiculous. There is no risk.
  - nofriend 50 days ago
    People know that models can be racist now. It's old hat. "LLM gets prompted into saying vile shit" hasn't been notable for years.
  - Alex2037 50 days ago
    nobody gives a shit about the journos and the terminally online. the smear campaign against AI is a cacophony, background noise that most people have learned to ignore, even here.
    consider this: https://news.ycombinator.com/from?site=nytimes.com
    HN's most beloved shitrag. day after day, they attack AI from every angle. how many of those submissions get traction at this point?
  - why-o-why 50 days ago
    I think you are confusing research with commodification.
    This is a research project, and it is clear how it was trained, and targeted at experts, enthusiasts, historians. Like if I was studying racism, the reference books explicitly written to dissect racism wouldn't be racist agents with a racist agenda. And as a result, no one is banning these books (except conservatives that want to retcon american history).
    Foundational models spewing racist white supremecist content when the trillion-dollar company forces it in your face is a vastly different scenario.
    There's a clear difference.
    [-]
    - aidenn0 50 days ago
      > And as a result, no one is banning these books (except conservatives that want to retcon american history).
      My (very liberal) local school district banned English teachers from teaching any book that contained the n-word, even at a high-school level, and even when the author was a black person talking about real events that happened to them.
      FWIW, this was after complaints involving Of Mice and Men being on the curriculum.
      [-]
      - zoky 50 days ago
        Banning Huckleberry Finn from a school district should be grounds for immediate dismissal.
        [-]
        somenameforme 50 days ago
        Even more so as the lesson of that story is perhaps the single most important one for people to learn in modern times.
        Almost everybody in that book is an awful person, especially the most 'upstanding' of types. Even the protagonist is an awful person. The one and only exception is 'N* Jim' who is the only kind-hearted and genuinely decent person in the book. It's an entire story about how the appearances of people, and the reality of those people, are two very different things.
        It being banned for using foul language, as educational outcomes continue to deteriorate, is just so perfectly ironic.
        why-o-why 50 days ago
        I don't support banning the book, but I think it is hard book to teach because it needs SO much context and a mature audience (lol good luck). Also, there are hundreds of other books from that era that are relevant even from Mark Twain's corpus so being obstinate about that book is a questionable position. I'm ambivalent honestly, but definitely not willing to die on that hill. (I graduated highschool in 1989 from a middle class suburb, we never read it.)
        [-]
        zoky 50 days ago
        I mean, you gotta read it. I’m not normally a huge fan of the classics; I find Steinbeck dry and tedious, and Hemingway to be self-indulgent and repetitious. Even Twain’s other work isn’t exactly to my taste. But I’ve read Huckleberry Finn three times—in elementary school just for fun, in high school because it was assigned, and I recently listened to it on audiobook—and enjoyed the hell out of each time. Banning it simply because it uses a word that the entire book simply couldn’t exist without is a crime, and does a huge disservice to the very students they are supposedly trying to protect.
        [-]
        why-o-why 50 days ago
        I have read it. I spent my 20s guiltily reading all of the books I was supposed to have read in high school but used Cliff's Notes instead. From my 20's perspective I found Finn insipid and hokey but that's because pop culture had recycled it hundreds of times since its first publication, however when I consider it from the period perspective I can see the satire and the pointed allegories that made Twain so formidable. (Funny you mention Hemingway. I loved his writing in my 20's, then went back and read some again in my 40's and was like "huh, this irritating and immature, no wonder i loved it in my 20's.")
      - Forgeties79 50 days ago
        It’s a big country of roughly half a billion people, you’ll always find examples if you look hard enough. It’s ridiculous/wrong that your district did this but frankly it’s the exception in liberal/progressive communities. It’s a very one-sided problem:
        * https://abcnews.go.com/US/conservative-liberal-book-bans-dif...
        * https://www.commondreams.org/news/book-banning-2023
        *https://en.wikipedia.org/wiki/Book_banning_in_the_United_Sta...
        [-]
        aidenn0 49 days ago
        I agree that the coordinated (particularly at a state level) restrictions[1] on books sits largely with the political Right in the US.
        However, from around 2010, there has been increasingly illiberal movement from the political Left in the US, which plays out at a more local level. My "vibe" is that it's not to the degree that it is on the Right, but bigger than the numbers suggest because librarians are more likely to stock e.g. It's Perfectly Normal at a middle school than something offensive to the left.
        1: I'm up for suggestions for a better term; there is a scale here between putting absurd restrictions on school librarians and banning books outright. Fortunately the latter is still relatively rare in the US, despite the mistitling on the Wikipedia page you linked.
        somenameforme 50 days ago
        A practical issue is the sort of books being banned. Your first link offer examples of one side trying to ban Of Mice and Men, Adventures of Huckleberry Finn, and Dr. Seuss, with the other side trying to ban many books along the lines of Gender Queer. [1] That link is to the book - which is animated, and quite NSFW.
        There are a bizarrely large number similar book as Gender Queer being published, which creates the numeric discrepancy. The irony is that if there was an equal but opposite to that book about straight sex, sexuality, associated kinks, and so forth - then I think both liberals and conservatives would probably be all for keeping it away from schools. It's solely focused on sexuality, is quite crude, illustrated, targeted towards young children, and there's no moral beyond the most surface level writing which is about coming to terms with one's sexuality.
        And obviously coming to terms with one's sexuality is very important, but I really don't think books like that are doing much to aid in that - especially when it's targeted at an age demographic that's still going to be extremely confused, and even moreso in a day and age when being different, if only for the sake of being different, is highly desirable. And given the nature of social media and the internet, decisions made today may stay with you for the rest of your life.
        So for instance about 30% of Gen Z now declare themselves LGBT. [2] We seem to have entered into an equal but opposite problem of the past when those of deviant sexuality pretended to be straight to fit into societal expectations. And in many ways this modern twist is an even more damaging form of the problem from a variety of perspectives - fertility, STDs, stuff staying with you for the rest of your life, and so on. Let alone extreme cases where e.g. somebody engages in transition surgery or 1-way chemically induced changes which they end up later regretting.
        [1] - https://archive.org/details/gender-queer-a-memoir-by-maia-ko...
        [2] - https://www.nbcnews.com/nbc-out/out-news/nearly-30-gen-z-adu...
        [-]
        Forgeties79 50 days ago
        From your NBC piece
        > About half of the Gen Z adults who identify as LGBTQ identify as bisexual,
        So that means ~15% of those surveyed are not attracted to the opposite sex (there’s more nuance to this statement but I imagine this needs to stay boilerplate), more or less, which is a big distinction. That’s hardly alarming and definitely not a major shift. We have also seen many cultures throughout history ebb and flow in their expression of bisexuality in particular.
        > There are a bizarrely large number similar book as Gender Queer being published, which creates the numeric discrepancy.
        This really needs a source. And what makes it “bizarrely large”? How does it stack against, say, the number heterosexual romance novels?
        > We seem to have entered into an equal but opposite problem of the past when those of deviant sexuality pretended to be straight to fit into societal expectations.
        I really tried to give your comment a fair shake but I stopped here. We are not going to have a productive conversation. “Deviant sexuality” come on man.
        Anyway it doesn’t change the fact that the book banning movement is largely a Republican/conservative endeavor in the US. The numbers clearly bear it out.
        [-]
        somenameforme 50 days ago
        I'll get back to what you said, but first let me ask you something if you would. Imagine Gender Queer was made into a movie that remained 100% faithful to the source content. What do you think it would be rated? To me it seems obvious that it would, at the absolute bare minimum, be R rated. And of course screening R-rated films at a school is prohibited without explicit parental permission. Imagine books were given a rating and indeed it ended up with an R rating. Would your perspective on it being unavailable at a school library then be any different? I think this is relevant since a standardized content rating system for books will be the long-term outcome of this all if efforts to introduce such material to children continues to persist.
        ------
        Okay, back to what you said. 30% being attracted to the same sex in any way, including bisexuality, is a large shift. People tend to have a mistaken perception of these things due to media misrepresentation. The percent of all people attracted to the same sex, in any way, is around 7% for men, and 15% for women [1], across a study of numerous Western cultures from 2016. And those numbers themselves are significantly higher than the past as well where the numbers tended to be in the ~4% range, though it's probably fair to say that cultural pressures were driving those older numbers to artificially low levels in the same way that I'm arguing that cultural pressures are now driving them to artificially high levels.
        Your second source discusses the reason for the bans. It's overwhelmingly due to sexually explicit content, often in the form of a picture book, targeted at children. As for "sexual deviance", I'm certainly not going General Ripper on you, Mandrake. It is the most precise term [2] for what we are discussing as I'm suggesting that the main goal driving this change is simply to be significantly 'not normal.' That is essentially deviance by definition.
        [1] - https://www.researchgate.net/publication/301639075_Sexual_Or...
        [2] - https://dictionary.apa.org/sexual-deviance
        [-]
        Forgeties79 49 days ago
        > any sexual behavior, such as a paraphilia, that is regarded as significantly different from the standards established by a culture or subculture. Deviant forms of sexual behavior may include voyeurism, fetishism, bestiality, necrophilia, sadism, and exhibitionism
        I don’t see Lesbian, Gay, Bisexual, or Transgender in here, which would absolutely be explicitly included in the list if it applied. Stop saying “sexual deviants” when talking about LGBT people. You know what you’re doing, it’s an incredibly loaded and inaccurate term. To continue calling them “sexual deviants” is a hostile and openly bigoted act. Bestiality and homosexuality are not in the same category and you are wrong to assert otherwise - all while masking it by misrepresenting the APA’s stance at that.
        I am not discussing this further. Enjoy the rest of your weekend.
        [-]
        somenameforme 48 days ago
        I'm not at all bigoted. If somebody genuinely is sexually attracted to the same sex, more power to them. Homosexuality also exists within nature and there are obviously people who simply have never been attracted to anything except the same sex since their first days. It's completely unreasonable to expect these people to try to change who they are on such a fundamental level, and so I think society, at large, should absolutely be tolerant of such.
        But there is a major difference between tolerating something and endorsing it. I think this is especially true in modern times. 30% of people are obviously not LGB. So you have people acting out sexually in a way that's probably not only 'unnatural' for them, but may end up harming them longterm. It's not a great situation. Because of this I do not indulge language policing which I believe is much more towards endorse than tolerate. Yes you are obviously right I'm aware of what I'm doing, but I also assure you if we met and had a coffee you'd find me anything but bigoted or hostile. We just have different worldviews.
        [-]
        aidenn0 46 days ago
        In the 1940s, Kinsey et. al. found that 37% of Adult Males had at least one homosexual experience, and 10% were "more or less exclusively homosexual for at least 3 years."
        The numbers for women were much lower, but 30% doesn't seem crazy high if you consider the reduced stigma of the bisexual label would allow people who are primarily heterosexual, but are open to homosexual experiences to label themselves as bi.
        [-]
        somenameforme 46 days ago
        Kinsey's work was poor quality and suffered from irreconcilable volunteer bias. It was completely based on people willing to be interviewed, in excessive detail and in his uniquely invasive 1 on 1 fashion, about their most intimate sexual experiences. Even today that is not something which 'normal' people agree to, and this was done during the 40s and 50s! On top of that he made 0 effort whatsoever to obtain a representative sample of society, so it's a biased sample of a biased sample, which drives an exponential deviation from reality due to multiplicative biasing.
        This is where you get his conclusions such as 37% of men having had a homosexual experience, or 69% of men having purchased a prostitute. It's plainly ridiculous.
        aidenn0 49 days ago
        [dead]
    - andsoitis 50 days ago
      > no one is banning these books
      No books should ever be banned. Doesn’t matter how vile it is.
  - gnarbarian 50 days ago
    this is FUD.
  - teaearlgraycold 50 days ago
    Sure but Grok already exists.
- dash2 50 days ago
  You have to understand that while the rest of the world has moved on from 2020, academics are still living there. There are many strong leftists, many of whom are deeply censorious; there are many more timeservers and cowards, who are terrified of falling foul of the first group.
  And there are force multipliers for all of this. Even if you yourself are a sensible and courageous person, you want to protect your project. What if your manager, ethics committee or funder comes under pressure?
- fkdk 50 days ago
  Maybe the authors are overly careful. Maybe avoiding to publish aspects of their work gives an edge over academic competitors. Maybe both.
  In my experience "data available upon request" doesn't always mean what you'd think it does.
smugtrain 49 days ago
This would actually be a wonderful way to learn physics, before GR and quantum mechanics
davidpfarrell 49 days ago
Can't wait for all the syncopated "Thou dost well to question that" responses!
PeterStuer 49 days ago
How does it do on Python coding? Not 100% troll, cross domain coherence is a thing.
dkalola 49 days ago
How can we interact with such models? Is there a web application interface?
Aeroi 49 days ago
i feel like this would be super useful for unique marketing copy and writing. The responses sound so sophisticated like I read it in my grandfather's tone and cadence.
joeycastillo 50 days ago
A question for those who think LLM’s are the path to artificial intelligence: if a large language model trained on pre-1913 data is a window into the past, how is a large language model trained on pre-2025 data not effectively the same thing?
[-]
- _--__--__ 50 days ago
  You're a human intelligence with knowledge of the past - assuming you were alive at the time, could you tell me (without consulting external resources) what exactly happened between arriving at an airport and boarding a plane in the year 2000? What about 2002?
  Neither human memory nor LLM learning creates perfect snapshots of past information without the contamination of what came later.
- block_dagger 50 days ago
  Counter question: how does a training set, representing a window into the past, differ from your own experience as an intelligent entity? Are you able to see into the future? How?
  [-]
- ex-aws-dude 50 days ago
  A human brain is a window to the person's past?
superkuh 50 days ago
smbc did a comic about this: http://smbc-comics.com/comic/copyright The punchline is that the moral and ethical norms of pre-1913 texts are not exactly compatible with modern norms.
[-]
- GaryBluto 50 days ago
  That's the point of this project, to have an LLM that reflects the moral and ethical norms of pre-1913 texts.
kldg 49 days ago
Very neat! I've thought about this with frontier models because they're ignorant of recent events, though it's too bad old frontier models just kind of disappear into the aether when a company moves on to the next iteration. Every company's frontier model today is a time capsule for the future. There should probably be some kind of preservation attempts made early so they don't wind up simply deleted; once we're in Internet time, sifting through the data to ensure scrapes are accurately dated becomes a nightmare unless you're doing your own regular Internet scrapes over a long time.
It would be nice to go back substantially further, though it's not too far back that the commoner becomes voiceless in history and we just get a bunch of politics and academia. Great job; look forward to testing it out.
lifestyleguru 50 days ago
You think Albert is going to stay in Zurich or emigrate?
erichocean 49 days ago
I would love to see this done, by year.
"Give me an LLM from 1928."
etc.
mleroy 50 days ago
Ontologically, this historical model understands the categories of "Man" and "Woman" just as well as a modern model does. The difference lies entirely in the attributes attached to those categories. The sexism is a faithful map of that era's statistical distribution.
You could RAG-feed this model the facts of WWII, and it would technically "know" about Hitler. But it wouldn't share the modern sentiment or gravity. In its latent space, the vector for "Hitler" has no semantic proximity to "Evil".
[-]
- arowthway 50 days ago
  I think much of the semantic proximity to evil can be derived straight from the facts? Imagine telling pre-1913 person about the holocaust.
anovikov 50 days ago
That Adolf Hitler seems to be a hallucination. There's totally nothing googlable about him. Also what could be the language his works were translated from, into German?
[-]
- sodafountan 50 days ago
  I believe that's one of the primary issues LLMs aim to address. Many historical texts aren't directly Googleable because they haven't been converted to HTML, a format that Google can parse.
ianbicking 50 days ago
The knowledge machine question is fascinating ("Imagine you had access to a machine embodying all the collective knowledge of your ancestors. What would you ask it?") – it truly does not know about computers, has no concept of its own substrate. But a knowledge machine is still comprehensible to it.
It makes me think of the Book Of Ember, the possibility of chopping things out very deliberately. Maybe creating something that could wonder at its own existence, discovering well beyond what it could know. And then of course forgetting it immediately, which is also a well-worn trope in speculative fiction.
[-]
- jaggederest 50 days ago
  Jonathan Swift wrote about something we might consider a computer in the early 18th century, in Gulliver's Travels - https://en.wikipedia.org/wiki/The_Engine
  The idea of knowledge machines was not necessarily common, but it was by no means unheard of by the mid 18th century, there were adding machines and other mechanical computation, even leaving aside our field's direct antecedents in Babbage and Lovelace.
usernamed7 50 days ago
> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.
oh COME ON... "AI safety" is getting out of hand.
zkmon 50 days ago
Why does history end in 1913?
diamond559 49 days ago
Research credits from lambda "ai" huh, where's your funding coming from this again? All to provide inaccurate slop to unwitting students, you should be ashamed of yourselves.
holyknight 50 days ago
wow amazing idea
r0x0r007 50 days ago
ffs, to find out what figures from the past thought and how they felt about the world, maybe we read some of their books, we will get the context. Don't prompt or train LLM to do it and consider it the hottest thing since MCP. Besides, what's the point? To teach younger generations a made up perspective of historic figures? Who guarantees the correctness/factuality? We will have students chatting with made up Hitler justifying his actions. So much AI slop everywhere.
internationalis 50 days ago
[dead]
internationalis 50 days ago
[dead]
acharneski 49 days ago
[dead]