I was super-excited about vector search and embeddings in 2024 but my enthusiasm has faded somewhat in 2025 for a few reasons:
- LLMs with a grep or full-text search tool turn out to be great at fuzzy search already - they throw a bunch of OR conditions together and run further searches if they don't find what they want
- ChatGPT web search and Claude Code code search are my favorite AI-assisted search tools and neither bother with vectors
- Building and maintaining a large vector speech index is a pain. The vector are usually pretty big and you need to keep them in memory to get truly great performance. FTS and grep are way less hassle.
- Vector matches are weird. So you get back the top twenty results... those might be super relevant or they might be total garbage, it's on you to do a second pass to figure out if they're actually useful results or not.
I expected to spend much of 2025 building vector search engines, but ended up not finding them as valuable as I had thought.
You might be getting a good _recall_ rate, since vectorize search is ANN, but the _precision_ can be low, because reranker piece is missing. So I would slightly improve it by adding 10 more lines of code and introducing reranker after the search (slightly increasing topK). Query expansion in the beginning can be also added to improve recall.
While embeddings are generally not required in the context of code, I am interested in how they perform in the legal and regulatory domain, where documents are substantially longer. Specifically, how do embeddings compare with approaches such as ripgrep in terms of effectiveness?
There’s a lot of previously intractable problems that are getting solved with these new embeddings models. I’ve been building a geocoder for the past few months and it’s been remarkable how close to google places I can get with just slightly enriched open street maps plus embedding vectors
That sounds really interesting. If you’re open to it, I’d be curious what the high-level architecture looks like (what gets embedded, how you rank results)?
What about re-ranking? In my limited experience, adding fast+cheap re-ranking with something like Cohere to the query results took an okay vector based search and made top 1-5 results much stronger
Query expansion and re ranking can and often do coexist
Roughly, first there is the query analysis/manipulation phase where you might have NER, spell check, query expansion/relaxation etc
Then there is the selection phase, where you retrieve all items that are relevant. Sometimes people will bring in results from both text and vector based indices. Perhaps and additional layer to group results
Then finally you have the reranking layer using a cross encoder model which might even have some personalisation in the mix
Also, with vector search you might not need query expansion necessarily since semantic similarity does loose association. But every domain is unique and there’s only one way to find out
Models like bge are small and quantized versions will fit in browser or on a tiny machine. Not sure why everyone reaches for an API as their first choice
- LLMs with a grep or full-text search tool turn out to be great at fuzzy search already - they throw a bunch of OR conditions together and run further searches if they don't find what they want
- ChatGPT web search and Claude Code code search are my favorite AI-assisted search tools and neither bother with vectors
- Building and maintaining a large vector speech index is a pain. The vector are usually pretty big and you need to keep them in memory to get truly great performance. FTS and grep are way less hassle.
- Vector matches are weird. So you get back the top twenty results... those might be super relevant or they might be total garbage, it's on you to do a second pass to figure out if they're actually useful results or not.
I expected to spend much of 2025 building vector search engines, but ended up not finding them as valuable as I had thought.
Roughly, first there is the query analysis/manipulation phase where you might have NER, spell check, query expansion/relaxation etc
Then there is the selection phase, where you retrieve all items that are relevant. Sometimes people will bring in results from both text and vector based indices. Perhaps and additional layer to group results
Then finally you have the reranking layer using a cross encoder model which might even have some personalisation in the mix
Also, with vector search you might not need query expansion necessarily since semantic similarity does loose association. But every domain is unique and there’s only one way to find out