I like that this relies on generating SQL rather than just being a black-box chat bot. It feels like the right way to use LLMs for research: as a translator from natural language to a rigid query language, rather than as the database itself. Very cool project!
Hopefully your API doesn't get exploited and you are doing timeouts/sandboxing -- it'd be easy to do a massive join on this.
I also have a question mostly stemming from me being not knowledgeable in the area -- have you noticed any semantic bleeding when research is done between your datasets? e.g., "optimization" probably means different things under ArXiv, LessWrong, and HN. Wondering if vector searches account for this given a more specific question.
I was thinking about it a fair bit lately. We have all sorts of benchmarks that describe a lot of factors in detail, but all those are very abstract and yet, those do not seem to map clearly to well observed behaviors. I think we need to think of a different way to list those.
This is very cool. If you're productizing this you should try to target a vertical. What does "literally don't have the money" mean? You should try to raise some in the traditional way. If nothing else works, at least try to apply to YC.
I could be distributed as a Claude skill. Internally, we've bundled a lot of external APIs and SQL queries into skills that are shared across the company.
This may exist already, but I'd like to find a way to query 'Supplementary Material' in biomedical research papers for genes / proteins or even biological processes.
As it is, the Supplementary Materials are inconsistently indexed so a lot of insight you might get from the last 15 years of genomics or proteomics work is invisible.
I imagine this approach could work, especially for Open Access data?
I wanted to find all cryoprotective agents that were tested at different temperatures, but it should be extandable to your problem too. Uses OpenAlex to traverse a citation graph and open access pdfs
I think a prompt + an external dataset is a very simple distribution channel right now to explore anything quickly with low friction. The curl | bash of 2026
It is not a protected term, so anything is state-of-the-art if you want it to be.
For example, Gemma models at the moment of release were performing worse their competition, but still, it is "state-of-the-art". It does not mean it's a bad product at all (Gemma is actually good), but the claims are very free.
Juicero was state-of-the-art on release too, though hands were better, etc.
> It's just marketing. [...] It is not a protected term, so anything is state-of-the-art if you want it to be.
But is it true?
I think we ought to stop indulging and rationalizing self-serving bullshit with the "it's just marketing" bit, as if that somehow makes bullshit okay. It's not okay. Normalizing bullshit is culturally destructive and reinforces the existing indifference to truth.
Part of the motivation people have seems to be a cowardly morbid fear of conflict or the acknowledgment that the world is a mess. But I'm not even suggesting conflict. I'm suggesting demoting the dignity of bullshitters in one's own estimation of them. A bullshitter should appear trashy to us, because bullshitting is trashy.
Really useful currently working on a autonomous academic research system [1] and thinking about integrating this. Currently using custom prompt + Edison Scientific API. Any plans of making this open source?
It's ultimately just a prompt, self-hosted models can use the system the same way, they just might struggle to write good SQL+vector queries to answer your questions. The prompt also works well with Codex, which has a lot of usage.
Anyone tried to use these prompts with Gemini 3 Pro? it feels like Claude, Gemini and GPT latest offerings are on par (excluding costs) and as a developer if you know how to query/spec a coder llm you can move between them at ease.
this is great>>@FTX_crisis - (@guilt_tone - @guilt_topic)
Using LLm for tasks that could be done faster with traditional algorithmic approaches seems wasteful, but this is one of the few legitimate cases where embeddings are doing something classical IR literally cannot. You could also make make the LLM explain the query it’s about to run. Before execution:
“Here’s the SQL and semantic filters I’m about to apply. Does this match your intent?”
Great idea! I just overhauled the prompt to explain the SQL + semantic filters better, and give the user clearer adjustment opportunities before long-running queries.
> I can embed everything and all the other sources for cheap, I just literally don't have the money.
How much do you need for the various leaks, like the paradise papers, the panama papers, the offshore leajay, the Bahamas leaks, the fincen files, the Uber files, etc. and what's your Venmo?
I think you misunderstood. The API key is for their API, not Anthropic.
If you take a look at the prompt you'll find that they have a static API key that they have created for this demo ("exopriors_public_readonly_v1_2025")
The use case could vary from person to person. When you think about it, hacker news has large enough data set ( and one that is widely accessible ) to allow all sorts of fun analyses. In a sense, the appeal is:
Would you mind walking through the logic of that a bit for me? I'm definitely interested in productizing this, and would be interested in open sourcing as soon as I have breathing room (I have no money).
Does that first generated query really work? Why are you looking at URIs like that? First you filter for a uri match, then later filter out that same match, minus `optimization`, when you are doing the cosine distance. Not once is `mesa-optimization` even mentioned, which is supposed to be the whole point?
Just comes down to your own view of what AGI is, as it's not particularly well defined.
While a bit 'time-machiney' - I think if you took an LLM of today and showed it to someone 20 years ago, most people would probably say AGI has been achieved. If someone wrote a definition of AGI 20 years ago, we would probably have met that.
We have certainly blasted past some science-fiction examples of AI like Agnes from The Twilight Zone, which 20 years ago looked a bit silly, and now looks like a remarkable prediction of LLMs.
By todays definition of AGI we haven't met it yet, but eventually it comes down to 'I know it if I see it' - the problem with this definition is that it is polluted by what people have already seen.
I think if you took an LLM of today and showed it to someone 20 years ago, most people would probably say AGI has been achieved.
I’ve got to disagree with this. All past pop-culture AI was sentient and self-motivated, it was human like in that it had it’s own goals and autonomy.
Current AI is a transcript generator. It can do smart stuff but it has no goals, it just responds with text when you prompt it. It feels like magic, even compared to 4-5 years ago, but it doesn’t feel like what was classically understood as AI, certainly by the public.
Somewhere marketers changed AGI to mean “does predefined tasks with human level accuracy” or the like. This is more like the definition of a good function approximator (how appropriate) instead of what people think (or thought) about when considering intelligence.
> Current AI is a transcript generator. It can do smart stuff but it has no goals
That's probably not because of an inherent lack of capability, but because the companies that run AI products don't want to run autonomous intelligent systems like that
> If someone wrote a definition of AGI 20 years ago, we would probably have met that.
No, as long as people can do work that a robot cannot do, we don't have AGI. That was always, if not the definition, at least implied by the definition.
I don't know why the meme of AGI being not well defined has had such success over the past few years.
I think it was supposed to be a more useful term than the earlier and more common "Strong AI". With regards to strong AI, there was a widely accepted definition - i.e. passing the Turing Test - and we are way past that point already: ( see https://arxiv.org/pdf/2503.23674 )
I have to challenge the paper authors' understanding of the Turing test. For an AI system to pass the Turing test its output needs to be indistinguishable from a human's. In other words, the rate of picking the AI system as human should be equal to the rate of picking the human. If in an experiment the AI system is picked at a rate higher than 50% it does not pass the Turing test (as the authors seem to believe) because another human can use this knowledge to conclude that the system being picked is not really human.
Also, I would go one step further and claim that to pass the Turing test an AI system should be indistinguishable from a human when judged by people trained in making such a distinction. I doubt that they used such people in the experiment.
I doubt that any AI system available today, or in the foreseeable future, can pass the test as I qualify it above.
People are constantly being fooled by bots in forums like Reddit and this one. That's good enough for me to consider the Turing test passed.
It also makes me consider it an inadequate test to begin with, since all classes of humans including domain experts can be fooled and have been in the past. The Turing test has always said more about the human participants than the machine.
Completely disagree - Your definition (in my opinion) is more aligned to the concept of Artificial Super Intelligence.
Surely the 'General Intelligence' definition has to be consistent between 'Artificial General Intelligence' and 'Human General Intelligence', and humans can be generally intelligent even if they can't solve calculus equations or protein folding problems. My definition of general intelligence is much lower than most - I think a dog is probably generally intelligent, although obviously in a different way (dogs are obviously better at learning how to run and catch a ball, and worse at programming python).
I do consider dogs to have "general intelligence" however despite that I have always (my entire life) considered AGI to imply human level intelligence. Not better, not worse, just human level.
It gets worse though. While one could claim that scoring equivalently on some benchmark indicates performance at the same level - and I'd likely agree - that's not what I take AGI to mean. Rather I take it to mean "equivalent to a human" so if it utterly fails at something we're good at such as driving a car through a construction zone during rush hour then I don't consider it to have met the bar of AGI even if it meets or exceeds us at other unrelated tasks. You have to be at least as general as a stock human to qualify as AGI in my books.
Now I may be but a single datapoint but I think there are a lot of people out there who feel similarly. You can see this a lot in popular culture with AGI (or often AI) being used to refer to autonomous humanoid robots portrayed as operating at or above a human level.
Related to all that, since you mention protein folding. I consider that to be a form of super intelligence as it is more or less inconceivable that an unaided human would ever be able to accomplish such a feat. So I consider alphafold to be both super intelligent and decidedly _not_ AGI. Make of that what you will.
The book is a collection of nine short stories telling the tale of three generations of a family before, during, and after a technological singularity.
Actually, this has already happened in a very literal way. Back in 2022, Google DeepMind used an AI called AlphaTensor to "play" a game where the goal was to find a faster way to multiply matrices, the fundamental math that powers all AI.
To understand how big this is, you have to look at the numbers:
The Naive Method: This is what most people learn in school. To multiply two 4x4 matrices, you need 64 multiplications.
The Human Record (1969): For over 50 years, the "gold standard" was Strassen’s algorithm, which used a clever trick to get it down to 49 multiplications.
The AI Discovery (2022): AlphaTensor beat the human record by finding a way to do it in just 47 steps.
The real "intelligence explosion" feedback loop happened even more recently with AlphaEvolve (2025). While the 2022 discovery only worked for specific "finite field" math (mostly used in cryptography), AlphaEvolve used Gemini to find a shortcut (48 steps) that works for the standard complex numbers AI actually uses for training.
Because matrix multiplication accounts for the vast majority of the work an AI does, Google used these AI-discovered shortcuts to optimize the kernels in Gemini itself.
It’s a literal cycle: the AI found a way to rewrite its own fundamental math to be more efficient, which then makes the next generation of AI faster and cheaper to build.
This is obviously cool, and I don't want to take away from that, but using a shortcut to make training a bit faster is qualitatively different from producing an AI which is actually more intelligent. The more intelligent AI can recursively produce a more intelligent one and so on, hence the explosion. If it's a bit faster to train but the same result then no explosion. It may be that finding efficiencies in our equations is low hanging fruit, but developing fundamentally better equations will prove impossible.
This made me laugh. Unfortunately, this is the world we live in. Most people who drive cars have no idea how they work, or how to fix them. And people who get on airplanes aren't able to flap their arms and fly.
Which means that humans are reduced to a sort of uselessness / helplessness, using tools they don't understand.
Overall, no one tells Uncle Bob that he doesn't deserve to fly home to Minnesota for Christmas because he didn't build the aircraft himself.
Hopefully your API doesn't get exploited and you are doing timeouts/sandboxing -- it'd be easy to do a massive join on this.
I also have a question mostly stemming from me being not knowledgeable in the area -- have you noticed any semantic bleeding when research is done between your datasets? e.g., "optimization" probably means different things under ArXiv, LessWrong, and HN. Wondering if vector searches account for this given a more specific question.
Larger, more capable embedding models are better able to separate the different uses of a given word in the embedding space, smaller models are not.
As it is, the Supplementary Materials are inconsistently indexed so a lot of insight you might get from the last 15 years of genomics or proteomics work is invisible.
I imagine this approach could work, especially for Open Access data?
I wanted to find all cryoprotective agents that were tested at different temperatures, but it should be extandable to your problem too. Uses OpenAlex to traverse a citation graph and open access pdfs
what makes this state of the art?
> Currently have embedded: posts: 1.4M / 4.6M comments: 15.6M / 38M That's with Voyage-3.5-lite
It is not a protected term, so anything is state-of-the-art if you want it to be.
For example, Gemma models at the moment of release were performing worse their competition, but still, it is "state-of-the-art". It does not mean it's a bad product at all (Gemma is actually good), but the claims are very free.
Juicero was state-of-the-art on release too, though hands were better, etc.
But is it true?
I think we ought to stop indulging and rationalizing self-serving bullshit with the "it's just marketing" bit, as if that somehow makes bullshit okay. It's not okay. Normalizing bullshit is culturally destructive and reinforces the existing indifference to truth.
Part of the motivation people have seems to be a cowardly morbid fear of conflict or the acknowledgment that the world is a mess. But I'm not even suggesting conflict. I'm suggesting demoting the dignity of bullshitters in one's own estimation of them. A bullshitter should appear trashy to us, because bullshitting is trashy.
[1] https://github.com/giatenica/gia-agentic-short
Using LLm for tasks that could be done faster with traditional algorithmic approaches seems wasteful, but this is one of the few legitimate cases where embeddings are doing something classical IR literally cannot. You could also make make the LLM explain the query it’s about to run. Before execution:
“Here’s the SQL and semantic filters I’m about to apply. Does this match your intent?”
How much do you need for the various leaks, like the paradise papers, the panama papers, the offshore leajay, the Bahamas leaks, the fincen files, the Uber files, etc. and what's your Venmo?
If you take a look at the prompt you'll find that they have a static API key that they have created for this demo ("exopriors_public_readonly_v1_2025")
who knows what kind of fun patterns could emerge
Okaaaaaaay....
While a bit 'time-machiney' - I think if you took an LLM of today and showed it to someone 20 years ago, most people would probably say AGI has been achieved. If someone wrote a definition of AGI 20 years ago, we would probably have met that.
We have certainly blasted past some science-fiction examples of AI like Agnes from The Twilight Zone, which 20 years ago looked a bit silly, and now looks like a remarkable prediction of LLMs.
By todays definition of AGI we haven't met it yet, but eventually it comes down to 'I know it if I see it' - the problem with this definition is that it is polluted by what people have already seen.
Current AI is a transcript generator. It can do smart stuff but it has no goals, it just responds with text when you prompt it. It feels like magic, even compared to 4-5 years ago, but it doesn’t feel like what was classically understood as AI, certainly by the public.
Somewhere marketers changed AGI to mean “does predefined tasks with human level accuracy” or the like. This is more like the definition of a good function approximator (how appropriate) instead of what people think (or thought) about when considering intelligence.
That's probably not because of an inherent lack of capability, but because the companies that run AI products don't want to run autonomous intelligent systems like that
Most people who took a look at a carefully crafted demo. I.e. the CEOs who keep pouring money down this hole.
If you actually use it you'll realize it's a tool, and not a particularly dependable tool unless you want to code what amounts to the React tutorial.
We've just become accustomed to it now, and tend to focus more on the flaws than the progress.
No, as long as people can do work that a robot cannot do, we don't have AGI. That was always, if not the definition, at least implied by the definition.
I don't know why the meme of AGI being not well defined has had such success over the past few years.
I think it was supposed to be a more useful term than the earlier and more common "Strong AI". With regards to strong AI, there was a widely accepted definition - i.e. passing the Turing Test - and we are way past that point already: ( see https://arxiv.org/pdf/2503.23674 )
Also, I would go one step further and claim that to pass the Turing test an AI system should be indistinguishable from a human when judged by people trained in making such a distinction. I doubt that they used such people in the experiment.
I doubt that any AI system available today, or in the foreseeable future, can pass the test as I qualify it above.
It also makes me consider it an inadequate test to begin with, since all classes of humans including domain experts can be fooled and have been in the past. The Turing test has always said more about the human participants than the machine.
Surely the 'General Intelligence' definition has to be consistent between 'Artificial General Intelligence' and 'Human General Intelligence', and humans can be generally intelligent even if they can't solve calculus equations or protein folding problems. My definition of general intelligence is much lower than most - I think a dog is probably generally intelligent, although obviously in a different way (dogs are obviously better at learning how to run and catch a ball, and worse at programming python).
It gets worse though. While one could claim that scoring equivalently on some benchmark indicates performance at the same level - and I'd likely agree - that's not what I take AGI to mean. Rather I take it to mean "equivalent to a human" so if it utterly fails at something we're good at such as driving a car through a construction zone during rush hour then I don't consider it to have met the bar of AGI even if it meets or exceeds us at other unrelated tasks. You have to be at least as general as a stock human to qualify as AGI in my books.
Now I may be but a single datapoint but I think there are a lot of people out there who feel similarly. You can see this a lot in popular culture with AGI (or often AI) being used to refer to autonomous humanoid robots portrayed as operating at or above a human level.
Related to all that, since you mention protein folding. I consider that to be a form of super intelligence as it is more or less inconceivable that an unaided human would ever be able to accomplish such a feat. So I consider alphafold to be both super intelligent and decidedly _not_ AGI. Make of that what you will.
The book is a collection of nine short stories telling the tale of three generations of a family before, during, and after a technological singularity.
To understand how big this is, you have to look at the numbers:
The Naive Method: This is what most people learn in school. To multiply two 4x4 matrices, you need 64 multiplications.
The Human Record (1969): For over 50 years, the "gold standard" was Strassen’s algorithm, which used a clever trick to get it down to 49 multiplications.
The AI Discovery (2022): AlphaTensor beat the human record by finding a way to do it in just 47 steps.
The real "intelligence explosion" feedback loop happened even more recently with AlphaEvolve (2025). While the 2022 discovery only worked for specific "finite field" math (mostly used in cryptography), AlphaEvolve used Gemini to find a shortcut (48 steps) that works for the standard complex numbers AI actually uses for training.
Because matrix multiplication accounts for the vast majority of the work an AI does, Google used these AI-discovered shortcuts to optimize the kernels in Gemini itself.
It’s a literal cycle: the AI found a way to rewrite its own fundamental math to be more efficient, which then makes the next generation of AI faster and cheaper to build.
https://deepmind.google/blog/discovering-novel-algorithms-wi... https://www.reddit.com/r/singularity/comments/1knem3r/i_dont...
Which means that humans are reduced to a sort of uselessness / helplessness, using tools they don't understand.
Overall, no one tells Uncle Bob that he doesn't deserve to fly home to Minnesota for Christmas because he didn't build the aircraft himself.
But we all think it.