Mistral OCR 3

(mistral.ai)

694 points | by pember 51 days ago

25 comments

temp0826 49 days ago
My current holy grail is my attempt to convert a Shipibo (an indigenous Peruvian language)-to-Spanish dictionary into a Shipibo-to-English dictionary. The pdf I have (available freely on archive.org) isn't a great scan (though I think it'd be a heck of a lot easier than some of the handwritten examples they show). Layout (2-columns) along with header/footers can cause some headaches, but it is all Latin script. This seems to fall on its face pretty badly (not even a couple of pages in), so my search continues. (The other major problem I'm having is trying to separate out Shipibo definitions/examples from the Spanish ones, and only translating the Spanish to English...so pretty complex I guess. I've been taking fresh stabs at this project every few months when I see OCR/LLM news pop up and continue to be disappointed)
[-]
- culi 49 days ago
  I'm assuming you're interested in studying Ayahuasca traditions?
  I recently learned that traditionally in Shipibo culture, ayahuasca was never meant to be given to "the normal mind". Instead the maestras would be the ones taking the ayahuasca in order to help guide them into diagnosing people dealing with various sicknesses.
  These maestras were also ranked by how many different plants they'd done a dieta on. A dieta is kinda similar to fasting. You can't shower with soap, you can't have sex, you can't have too much salt/seasoning, can't be exposed to too much smoke, can't have alcohol, etc. And you use that specific plant throughout your time. Basically you want to eliminate any conflicting variables so you can experience the plant as purely as possible to understand its effects. Traditionally these dietas could last over a year but modern day maestros typically do them for just a few weeks.
  I don't really have a point to this. Just found it fascinating how deeply and strictly they study certain plant medicines and wanted to share
  [-]
  - temp0826 49 days ago
    Yes essentially. I've got a few resources cobbled together over the last few years but it'd be really nice to have this reference (my Spanish isn't the best, and running to the translator for a definition can be a little annoying). Also to share with fellow learners/apprentices I know. There are a couple of classes out there (which are actually geared more toward the ceremonial/icaro language, not purely conversational Shipibo, which is a bit simpler as you don't need to worry as much about conjugation and other complexities) which I might look into eventually.
    (Fwiw I've accumulated a couple years worth of dieta under my belt and am well aware of the restrictions! It's indeed very fascinating, been pretty serious about it the last few years and I've barely scratched the surface)
    [-]
    - canucker2016 49 days ago
      Couldn't you use your smartphone and Google Lens (on Android, Google app on iOS includes Google Lens functionality) to translate the Spanish to English?
      FYI - Lens on Android does in-place language translation including attempting to use the same/similar font that the original language is written/printed.
      Unfortunately, I don't think Lens can be used in an automated batch translation mode to convert an entire book/multiple pages
  - mkaic 49 days ago
    I suppose both of us watched the same Youtube video by Metta Beshay (i think that is his name?)
    [-]
    - temp0826 49 days ago
      I actually did too lol. I was pleasantly surprised because it was actually decent and realistic about the situation (a lot of people get this romantic idea about going to the jungle to live and learn with the indigenous and have an "authentic" experience, and this does a pretty good job if dispelling that).
- b112 49 days ago
  I applaud your efforts, but that seems difficult to me. There's so much nuance in language, and the original spanish translation would even be dependent upon locale-destination of the original dictionary. Which would also be time based, as language changes over time.
  And that translation is likely only a rough approximation, as words don't often translate directly. To add in an extra layer (spanish -> english) seems like another layer of imperfect (due to language) abstraction.
  Of course your efforts are targeting a niche, so likely people will understand the attempt and be thankful. I hope this suggestion isn't too forward, but this being an electronic version, you could allow some way for the original spanish to be shown if desired. That sort of functionality would be quite helpful, even non-native spanish speakers might get a clearer picture.
  What tools are you using to abstract all of this?
  If the spacing and columns of the images are consistent, I'd think imagemagick would allow you to automate extraction by column (eg, cutting the individual pages up), and OCR could then get to work.
  For the Shipibo side, I'd want to turn off all LLM interpretation. That tends to use known groupings of words to probabilistically determine best-match, and that'd wreak havoc in this case.
  Back to the images, once you have imagemagick chop and sort, writing a very short script to iterate over the pages, display them, and prompt with y/n would be a massive time saver. Doing so at each step would be helpful.
  For example, one step? Cut off header and footer, save to dir. Using helpful naming conventions (page-1, and page-1-noheader_footer). You could then use imagemagick to combine page-1 and -age-1-noheader_footer side by side.
  Now run a simple bash vet script. Each of 500 pages pops up, you instantly see the original and the cut result, and you hit y or n. One could go through 500 pages like this in 10 to 20 minutes, and you'd be left with a small subset of pages that didn't get cut properly (extra large footer or whatever). If it's down to 10 pages or some such, that's an easy tweak and fix for those.
  Once done, you could do the same for column cuts. You'd already have all the scripts, so it's just tweaking.
  I'm mentioning all of this, because combo of automation plus human intervention is often the best method to something such as this.
  Anyhow, good luck!
  [-]
  - temp0826 49 days ago
    Thanks for the suggestions, I do appreciate it. I was being pretty brief with my post but I really have spent a lot of time and tried this from a number of angles. I've had good luck with non-LLM tools to do the initial OCR, but it's not context aware especially about column/page breaks (like I mentioned it's kind of a dirty scan, and if the breaks happen on a Shipibo part it barfs a bit. Good for a rough search at least).
    I would love to create a json version of it that would essentially have a bunch of fields for each word (Shipibo/Spanish/English word/definition/example, type of word, etc). It's further complicated by how words can be modified in Shipibo (it's actually a very technical language- words can have any number of prefixes and suffixes tagged on to change their meaning and their precision. In their "icaros", the healing songs they sing in ceremony, the most technical use of the language is considered to be the most beautiful. Essentially poetry from their "medical" jargon).
    I've done some human-in-the-loop attempts but still come up short in one way or another (I end up getting frustrated and throwing my hands up after seeing how much time I dump on it). So I figure this will remain a good test as the tools (and my prompting abilities) get better. It's definitely not urgent for me.
- knadh 49 days ago
  Once you have managed to get the data out and structured, you may want to check out dict.press. It's a dictionary publishing and management tool (which I maintain). Multiple widely used Indian dictionary projects run on it.
  [-]
  - temp0826 49 days ago
    Will take a look, I assumed there'd be some tools along those lines, thanks for the suggestion
- 29athrowaway 48 days ago
  The linguistic holy grails over there are resolving the mysteries around Qhapac simi, Puquina and quipus.
vintermann 49 days ago
I appreciate having an OCR interface rather than having to chat with a bot, but unfortunately chatting with Gemini 3 gives far better results than this. I gave it the document Gemini 3 got a surprisingly good result on:
https://urn.digitalarkivet.no/URN:NBN:no-a1450-rk10101508282...
and the output wasn't even recognizably Danish.
Just out of pity I gave it a birthday card from my sister written in very readable modern handwriting, and while in managed to make the contents of that readable, the errors it made reveals that it has very little contextual intelligence. Even if ! and ? can be hard to tell apart sometimes, they weren't here, and you do not usually start a birthday letter with "Happy Birthday brother?"
[-]
- potsandpans 47 days ago
  Something I noticed about gemini: I've been experimenting with transcribing old handwritten gaelic archives. Qwen 235b a22b instruct appears to give a much more faithful reproduction compared to gemini, for the simple fact that gemini keeps hallucinating an old gaelic faerie tale
- hoyd 48 days ago
  Voynich manuscript next :-)
- suspended_state 49 days ago
  > got a surprisingly good result
  > the output wasn't even recognizably Danish
  How would you know that it's good then?
  [-]
  - kadoban 49 days ago
    I believe you misread. My reading is that Gemini 3 gave a good result on a certain input, so they gave the same input to this model and the result was poor.
    [-]
    - vintermann 48 days ago
      Yes. I can also read maybe 60-80% of this document tolerably well myself, with effort.
    - suspended_state 49 days ago
      You're correct.
petcat 49 days ago
It seems like Mistral is just chasing around sort of "the fringes" of what could be useful AI features. Are they just getting out-classed by OAI, Google, Anthropic?
It seems like EU in general should be heavily invested in Mistral's development, but it doesn't seem like they are.
[-]
- tensor 49 days ago
  Form processing is vastly more useful than meme generation. When people need to do real work this is the sort of tool they are going to reach for.
  [-]
  - sbuttgereit 49 days ago
    Yep. I saw the title and got excited.... this is a particular problem area where I think these things can be very effective. There are so many data entry class tasks which don't require huge knowledge or judgement... just clear parsing and putting that into a more machine digestible form.
    I don't know... feels like this sort of area, while not nearly so sexy as video production or coding or (etc.)... but seems like reaching a better-than-human performance level should be easier for these kinds of workloads.
- bee_rider 49 days ago
  Following the leaders too closely seems like a bad move, at least until a profitable business model for an AI model training company is discovered. Mistral’s models are pretty good, right? I mean they don’t have all the scaffolding around them that something like chatGPT does, but building all that scaffolding could be wasted effort until a profitable business model is shown.
  Until then, they seem to be able to keep enough talent in the EU to train reasonably good models. The kernel is there, which seems like the attainable goal.
  [-]
  - qwytw 49 days ago
    >Mistral’s models are pretty good, right
    Are they? IIRC their best model is still worse than the gpt-oss-120B?
    [-]
    - amarcheschi 49 days ago
      Devstral 2 should be above https://mistral.ai/news/devstral-2-vibe-cli
      Though I haven't checked other benchmarks and they only report swe
      [-]
      - acters 49 days ago
        Devstral 2 is free from the API. That has to be a bigger point to what makes it better. The price to performance ratio is practically better in every way. Does it matter if the performance is slightly worse when it is practically free?
        [-]
        qwytw 49 days ago
        Yes, but if it's actually competitive that won't last that long. Mistral will do the same as google (cut their free tier by 50x or so) if they ever catch up. Financially anything else would make no sense.
        Of course currently Mistral has an insane free tier, 1 billion tokens for each(?) of their models per month.
    - tomalbrc 49 days ago
      Calling it oss is a farce
  - menaerus 49 days ago
    They can't hire the best talent because the most experienced people will not leave their homes to chase a high-risk role with questionable remuneration by relocating their whole life to Paris or London.
    This goes to show how leaders in Mistral don't quite get that they are not special as they seem to think they are. Anthropic or OpenAI also require their talent to relocate but with stakes that are at least a high reward - $500k or $1M a year is a good start that is maybe worth investing into.
    [-]
    - StopDisinfo910 48 days ago
      > They can't hire the best talent because the most experienced people will not leave their homes to chase a high-risk role with questionable remuneration by relocating their whole life to Paris or London.
      The best talents have been regularly leaving Paris and London, India and China for decades. With the US closing its borders, they definitely have a chance to lure some.
    - bee_rider 49 days ago
      If somebody is in the EU already that calculation completely flips. We have a strong software startup industry in the US, would it really be that surprising if there was more unallocated talent in the EU, at this point?
      [-]
      - menaerus 48 days ago
        > If somebody is in the EU already that calculation completely flips.
        Would you find it compelling to move your whole life for ~100k EUR when you can make as much or more at your home city, with a job that is almost certainly more stable?
        And I meant the Europeans. People in EU don't have a culture of moving between cities or countries unless they really have a strong reason to, e.g. can't find a job at home.
        > would it really be that surprising if there was more unallocated talent in the EU, at this point?
        I am pretty sure there is. It has changed over the course of last few years, primarily because of COVID, and companies willing to offer remote contracts, but it's far from being able to utilize the talent.
        [-]
        wqaatwt 46 days ago
        > People in EU don't have a culture of moving between cities or countries
        Southern and Eastern Europeans certainly do.
        [-]
        menaerus 46 days ago
        I said unless they have a strong reason to. Do they?
        [-]
        wqaatwt 45 days ago
        Do people in other places constantly move to other states/countries on a whim without any strong reasons?
- _menelaus 49 days ago
  Mistral is pursuing pursuing B2B use cases. Thats because they're releasing open models and the big thing about B2B is they HATE sending their data off-prem. OCR'ing and organizing old docs is a huge feature in B2B. Mistral's strategy seems smart to me.
  [-]
  - voxic11 48 days ago
    Why did they make this model only available though their API then?
    [-]
    - _menelaus 47 days ago
      That is a good question, I don't know.
- IMTDb 49 days ago
  > It seems like EU in general should be heavily invested in Mistral's development, but it doesn't seem like they are
  The EU is extremely invested in Mistral's development: half of the effort is finding ways to tax them (hello Zucman tax), the other half is wondering how to regulate them (hello AI act)
  [-]
  - District5524 49 days ago
    Zucman taxes rich individuals (100m€+), not Mistral. AI Act rules are not that difficult to comply with by GPAI model providers as long as the model doesn't become systemic risk... They have to spend a lot more time on PR and handshaking with French politicians than on AI compliance. They probably don't even have a single FTE for that... So that's just prejudice I believe.
- lm28469 49 days ago
  We're too busy with real life to bother with generating SVGs of pelicans on bicycles sorry, but feel free to dump billions on chatbots
- VWWHFSfQ 49 days ago
  I think there is a lot of broad support, but they're just kind of hamstrung by EU regulation on AI development at this stage. I think the end game will ultimately be getting acquired by an American company, and then relocating.
  [-]
  - tensor 49 days ago
    I hope the EU blocks any acquisitions by American companies. The west needs to start protecting its strategic assets.
  - bootsmann 49 days ago
    Do you have any source on this other than vibes based on”EU bad” sentiment?
- BoredPositron 49 days ago
  I guess it's better to do the same stuff everyone else is doing?
- lawlessone 49 days ago
  >It seems like EU in general should be heavily invested
  Maybe, i think it will be to our benefit when the bubble pops that we are not heavily invested, no harm investing a little.
GZGavinZhao 49 days ago
Does it handle math expressions (those rendered from LaTeX) well? I've been looking for a good OCR model to transcribe my math textbooks into markdown (obviously ignoring the images and figures) with LaTeX as math expressions, and none of the current OCR models work reliably enough.
EDIT: you can try it yourself for free at https://console.mistral.ai/build/document-ai/ocr-playground once you create a developer account! Fingers crossed to see how well it works for my use case.
[-]
- loaf_api 49 days ago
  I've just finished processing thousands of documents using the Gemini Pro 3 vision model and it outperformed every OCR and image model I've tested by a long shot, perfect markdown with latex for the math every time.
  [-]
  - lysecret 49 days ago
    3 flash is also insanely good even slightly outperforms 3 pro for me.
  - pacman1337 49 days ago
    what prompt are you using?
- RagnarD 49 days ago
  Please post an update on how well it works for you.
- nerbert 49 days ago
  Just need to open the link to answer that question.
vasco 49 days ago
Gave it a birth registry from a Portuguese locality from 1755 which my dad and I often decipher to figure out geneology and it did a terrible job.
Regular Gemini Thinking can actually get 70-80% of the documents correct except lots of mistakes on given names. Chatgpt maybe understands like 50-60%.
This Mistral model butchered the whole text, literally not a word was usable. To the point I think I'm doing something wrong.
The test document: https://files.fm/u/3hduyg65a5
[-]
- CamperBob2 49 days ago
  Quick tip: when you digitize a page, put a sheet of black paper behind it. That keeps the ink on the other side from bleeding through.
  [-]
  - vasco 49 days ago
    You can tell that to the national archives!
- observationist 49 days ago
  Just gave it a shot with Grok 4.1 thinking - do you have the ground truth translation to compare? I've tried 4 different times, with slight tweaks adding information from your description, and it's given me a range of interpretations. It'd be nice to see if any of them got close - a couple were more like pulpy telenovela plots, lol.
  The model might need tuning in order to be effective - this is normal for releases of image mode models, and after a couple days, there will be properly set up endpoints to test from, so it might be much better than you think. Or it could be really bad with turn of the 19th century portugese cursive.
- zzleeper 49 days ago
  Oh god, I'm sure I wouldn't come close to 50%; that's so hard to read
  [-]
  - vasco 49 days ago
    It's tough but my dad is quite good at it. He has books of common abbreviations and agglutinations from different centuries. After you get used to it it's faster and very fun.
    We were mind blown how good Gemini was at it.
    [-]
    - ilamont 49 days ago
      I am too. Gemini 3.0 fast on old scrawled diary entries in English from 100+ years ago got them 95% right. It also added historical context when I prefaced the images with the identity of the writer, such as summaries of an old military unit history in Europe post-WW1 it got from a very obscure U.S. Army archive.
      Huge timesaver.
- amelius 49 days ago
  Forgivable, as that's a quite atypical document, I'd say.
  [-]
  - vintermann 46 days ago
    There are thousands of thousands of documents like this, and many which are much harder to read. It's a very typical document for genealogy.
  - vasco 49 days ago
    Not atypical enough for Gemini is my point. Also its one of the most common hand written document types in existance since at the time almost nobody other than the local priest knew how to write and birth and marriage certificates were probably the only written documents in whole towns and villages. This is the same throughout Europe at least.
Tiberium 49 days ago
From a tweet: https://x.com/i/status/2001821298109120856
> can someone help folks at Mistral find more weak baselines to add here? since they can't stomach comparing with SoTA....
> (in case y'all wanna fix it: Chandra, dots.ocr, olmOCR, MinerU, Monkey OCR, and PaddleOCR are a good start)
[-]
- belval 49 days ago
  I've worked on document extraction a lot and while the tweet is too flippant for my taste, it's not wrong. Mistral is comparing itself to non-VLM computer vision services. While not necessarily what everyone needs, they are a very different beasts compared to VLM based extraction because it gives you precise bounding boxes, usually at the cost of larger "document understanding".
  Its failure mode are also vastly different. VLM-based extraction can misread entire sentences or miss entire paragraphs. Sonnet 3 had that issue. Computer vision models instead will make in-word typos.
  [-]
  - wills_forward 49 days ago
    Why not use both? I just built a pipeline for document data extraction that uses PaddleOCR, then Gemini 3 to check + fix errors. It gets close to 99.9% on extraction from financial statements finally on par with humans.
    [-]
    - vrc 49 days ago
      I did the opposite. Tesseract to get bboxes, words, and chars and then mistral on the clips with some reasonable reflow to preserve geometry. Paddle wasn’t working on my local machine (until I found RapidOCR). Surya was also very good but because you can’t really tweak any knobs, when it failed it just kinda failed. But Surya > Rapid w/ Paddle > DocTr > Tesseract while the latter gave me the most granularity when I needed it.
      Edit: Gemini 2.0 was good enough for VLM cleanup, and now 2.5 or above with structured output make reconstruction even easier.
    - jadbox 49 days ago
      This is The Way. Remember AI doesn't have to replace existing solutions but can tactfully supplement it.
  - zerocrates 49 days ago
    Is DeepSeek's not VLM?
- vinckr 49 days ago
  after clicking on your link I browsed twitter for a minute and damn that place has become weird (or maybe it always was?)
  [-]
  - crystal_revenge 49 days ago
    As someone who has been on Twitter since 2007, it’s radically changed in the last few years to the point of being unrecognizable.
- ozgune 49 days ago
  Also, do you know if their benchmarks are available?
  In their website, the benchmarks say “Multilingual (Chinese), Multilingual (East-asian), Multilingual (Eastern europe), Multilingual (English), Multilingual (Western europe), Forms, Handwritten, etc.” However, there’s no reference to the benchmark data.
- logicprog 49 days ago
  I'd want to see a comparison with Qwen 3 VL 235B-A22B, which is IME significantly better than MinerU.
- nerbert 49 days ago
  On the OP link, they compare themselves to the capabilities of leaderboard AI's and beat them.
hereme888 49 days ago
I'm reading worse performance than many OSS offerings like Paddle, MinerU, MonkeyOCR, etc:
https://www.codesota.com/ocr
[-]
- rafram 49 days ago
  Their handwriting benchmark is not useful. The test cases aren’t even handwritten!
  https://www.codesota.com/ocr/best-for-handwriting
  [-]
  - kwikiel 49 days ago
    That’s just the illustration. But this is misleading - I will fix it asap and show real examples. I’ve run the mistral ocr on other benchmark
  - dr_dshiv 49 days ago
    Do you know of any good handwriting eval/benchmark? I haven’t been able to find one.
- nextworddev 49 days ago
  Thanks for sharing this site
jwr 49 days ago
Sadly, only available through a hosted API. I don't see how this is useful for OCR, unless you are OK with uploading your confidential documents to "the cloud"?
I'm still hoping for improved locally hosted models: qwen3-vl:30b-a3b-thinking-q4_K_M is already really good.
[-]
- dan-robertson 49 days ago
  Businesses sign contracts about what happens when the data is uploaded. Ultimately your purpose is to make money more than maximally locking down your IP.
pzo 49 days ago
there has been so many open source OCR in the last 3 months that would be good to compare to those especially when some are not even 1B params and can be run on edge devices.
- paddleOCR-VL
- olmOCR-2
- chandra
- dots.ocr
I kind of miss there is not many leaderboard sections or arena for OCR and CV and providers hosting those. Neglected on both Artificial Analysis and OpenRouter.
[-]
- culi 49 days ago
  Someone posted a project here about a month ago where they compare models in head-to-head matchups similar to llmarena
  https://www.ocrarena.ai/leaderboard
  Hasn't been updated for Mistral but so far gemeni seems to top the leaderboard.
  [-]
  - jeffbee 49 days ago
    OCR developers from decades past must be slapping their foreheads now that it seems users will wait a whole minute per page and be happy.
    [-]
    - delaminator 49 days ago
      What they are happy about is accurate OCR.
      Getting the wrong answer really quickly is not the best goal.
    - culi 49 days ago
      You can also sort by latency. dots.ocr has the lowest at 3.8s/page. And although it doesn't fare very well against much larger slower models, it's still streets ahead of traditional OCR techniques
  - andai 49 days ago
    How can something have a very high ELO but a very low win rate?
    [-]
    - BlackLotus89 49 days ago
      You don't loose any elo if your opponent is much stronger than you. Remis could in theory play a part as well.
  - pplonski86 49 days ago
    very nice comparison! I'd like to see on what examples OCR engines fail
- pzo 49 days ago
  what I like in MistralOCR is that they have simple pricing $1/1k pages and API hosted on their servers. With other OCR is hard to compare pricing because are token based and you don't know how many tokens is the image unless you run your own test.
  E.g. with Gemini 3.0 flash you might seem that model pricing increased only slightly comparing to Gemini 2.5 flash until you test it and will see that what used to be 258 per 384x384 input tokens now is around 3x more.
  [-]
  - gunalx 49 days ago
    But they doubled the price g for this new mistralocr3 model to 2$
  - amelius 49 days ago
    Simple would be to bill per character.
    Now I have to figure out how large a page can be.
- andai 49 days ago
  I spent like three hours trying to get one of these running and then gave up. I think the paddleOCR one.
  It took an hour and a half to install 12 gigabytes of pytorch dependencies that can't even run on my device, and then it told me it had some sort of versioning conflict. (I think I was supposed to use UV, but I had run out of steam by that point.)
  Maybe I should have asked Claude to install it for me. I gave Claude root on a $3 VPS, and it seems to enjoy the sysadmin stuff a lot more than I do...
  Incidentally I had a similar experience installing open web UI... It installed 12 GB of pytorch crap.. I rage quit and deleted the whole thing, and replicated the functionality I actually needed in 100 lines of HTML.... Too bad I can't do that with OCR ;)
  [-]
  - CamperBob2 49 days ago
    gemini-cli is good for this sort of thing. You can just tell it "Find out why xyz.py doesn't run" and let it crunch. It will try reasonably hard to get you out of Python dependency hell, and (more important) it generally knows when to give up.
    But yes, in general, you want to use uv. Otherwise, the next Python application you install WILL break the last one you installed.
    I suppose you could use gemini-cli as a substitute for proper Python virtual environment management, always letting it fix whatever broke since the last time you tried to run the program, but that'd be like burning down a rainforest to toast a marshmallow.
    [-]
    - andai 49 days ago
      Actually, I just remembered, this was inside uv!
- hereme888 49 days ago
  https://www.codesota.com/ocr
- jammo 49 days ago
  [dead]
tecoholic 49 days ago
> Mistral OCR 3 is ideal for both high-volume enterprise pipelines and interactive document workflows.
I don’t know how they can make this statement with 79% accuracy rate. For any serious use case, this is an unacceptable number.
I work with scientific journals and issues like 2.9+0.5 and 29+0.5 is something we regularly run into that has us never being able to fully trust automated processes and require human verification every step.
[-]
- knrz 49 days ago
  Those are tricky! We've found https://www.datalab.to/ to be good for this @ thesynthesis.company
- MallocVoidstar 49 days ago
  Where are you seeing 79% accuracy? 79% only occurs on the page as a win rate, not an accuracy
  [-]
  - g947o 49 days ago
    And I believe the number is 74%, compared to OCR 2.
    What matters is whether this is better than competition/alternatives. Of course nobody is just going to take the output as is. If you do that, that's your problem.
    [-]
    - skygazer 49 days ago
      79% win over OCR2 was just for English.
  - tecoholic 49 days ago
    Right! I didn’t know the difference. Does it mean for 79 out of 100 documents they produce 100% accurate OCR, I doubt it. The win rate sounds like a practical approximation of accuracy here to me.
    If I am wildly off, I am happy to learn.
    [-]
    - ricardobeat 49 days ago
      79% of the time it beats the previous model.
      The previous version already achieved up to 99% accuracy in multiple benchmarks, already better than most OCR software.
      [-]
      - tecoholic 49 days ago
        Thank you.
    - argsnd 49 days ago
      79 out of 100 documents Mistral OCR 3 provides better output than Mistral OCR 2.
jesuslop 49 days ago
I am testing it as a replacement of MathPix, first few tests look rather decent. In python for windows: https://pastebin.com/uyiFHKdJ (alpha version prototype). Launches windows snip tool, waits for clipboard image, calls Mistral, retrieves markdown and puts it as text in the clipboard, ready to be pasted in Typora, Obsidian, or other markdown editor.
speff 49 days ago
This might be a good place to check the options available for OCR in-place translations. I took a look at OCR3, but it doesn't seem to support my use-case. It looks more tailored towards data extraction for further processing.
I've got some foreign artbooks that I would like to get translated. The translations would need to be in place since the placement of the text relative to the pictures around it is fairly important. I took a look at some paid options online, but they seemed to choke - mostly because of the non-standard text placements and all.
The best solution I could come up with is using Google Lens to overlay a translation while I go through the books, but holding a camera/tablet up to my screen isn't very comfortable. Chrome has Lens built in, but (IIRC) I still need to manually select sections for it to translate - it's not as easy to use as just holding my phone up.
Anyone know of any progress towards in-place OCR/translations?
[-]
- claar 49 days ago
  If you don't mind a paid solution, try DEEPL. I also use Word's built in document translation to good effect.
  [-]
  - speff 49 days ago
    I don't mind paying for one, though I do remember trying DEEPL without much success. Can't remember the problem offhand, but one of the services I tried just gave me a generic error when I uploaded the PDF. My view at the time was that it had a conniption and just gave up.
    Wonder if Word uses the same system Edge has. I remember Edge was also good, but like Chrome's Lens, I'd need to highlight sections for it to get translated. Edge also OCR'd everything very well - just didn't do the translation part automatically.
- haraldooo 49 days ago
  I’m fairly confident this is solvable quite well with “just two api calls”. Are examples of those books available online?
  [-]
  - speff 49 days ago
    Sure - there are some good examples in the product pictures for this book: https://www.amazon.com/hands-Takami-Kagami-teaches-power/dp/...
ethin 48 days ago
So I tried this on the NVMe specification (I have a huge library of PDFs) and it worked decently, though the output had some oddities:
- Parts of the table of contents were headings
- I didn't like how tables were links to separate markdown files.
In theory, I could recombine everything into one document, but that would require complicated Markdown parsing and manipulation and I wasn't even sure how to go about that given how free-form the resulting text was. I also haven't gone through the entire document (it's 784 pages) to check to make sure it's correct compared to what pdftotext or acrobat could create, so there's that too.
film42 49 days ago
Is open router still sending all OCR jobs to Mistral? I wonder if they're trying to keep that spot. Seems like Mistral and Google are the best at OCR right now, with Google leading Mistral by a fair bit.
[-]
- numlocked 49 days ago
  (I work at OpenRouter) If you send a PDF to our API we will:
  1. Use native PDF parsing if the model supports it
  2. Use this Mistral OCR model (we updated to this version yesterday)
  3. UNLESS you override the "engine" param to use an alternate. We support a JS-based (non-LLM) parser as well [0]
  So yes, in practice a lot of OCR jobs go to Mistral, but not all of them.
  Would love to hear requests for other parsers if folks have them!
  [0] https://openrouter.ai/docs/guides/overview/multimodal/pdfs#p...
  [-]
  - vikp 49 days ago
    Hey, I'm the founder of Datalab (we released Chandra OCR). I see someone requested it below - happy to help you all get setup. I'm vik@datalab.to
  - siquick 49 days ago
    That links gives an error and so does https://openrouter.ai/docs/guides/overview/multimodal/pdfs
  - dimitri-vs 49 days ago
    Chandra
singularity2001 49 days ago
No one mentioning the possibly most beautiful css effect on the Internet??
[-]
- jbk 49 days ago
  How so?
i_am_not_groot 49 days ago
Finally a way to read doctor's prescriptions
7thpower 49 days ago
My main beef with mistral is that they don’t bother to respond to customer inquiries for products the hide behind “reach out for pricing” terms, so even if they were better than SoTA it wouldn’t really matter.
[-]
- 650REDHAIR 49 days ago
  I absolutely loathe dealing with sales people.
  I will pay a premium for an inferior product or service if it means I don't have to deal with sales people.
  [-]
  - 7thpower 49 days ago
    Agreed. In this case the offering just fit neatly into a non core stack we had designed and displaced a bunch of stuff didn’t want to build ourselves.
    I also hate dealing with sales people and am not going to reach out to them via another avenue as they will try and posture as if they’re doing us a huge favor (in contrast to me begging gdb for gpt4 api access).
constantinum 49 days ago
At instances where data accuracy is of paramount importance, i think a hybrid route of non-llm ocr for data parsing and LLMs for structured data extraction is the safe passage to tread on. Seen better results for LLMWhisperer(OCR)[1] and Latest Gemini.
[1] - https://pg.llmwhisperer.unstract.com/
Western0 49 days ago
I need solresol in any language. It are constructed for discusion and negotiation on war
singularity2001 49 days ago
Not OS / free weights right?
stri8ted 48 days ago
What languages does it support? I can't find this info anywhere on the page.
amelius 49 days ago
Can we have an open source tool that uses the same API, and that you can just instruct to use Mistral or any other service if you think the open source tool has quality issues for a particular text?
This makes more sense to me, as I find that FOSS OCR is quite okay for most usecases.
awaymazdacx5 49 days ago
[dead]
greenique 49 days ago
[dead]
_el9w 49 days ago
[flagged]
[-]
- ipsum2 49 days ago
  You might want to mention that you are a competitor.
  [-]
  - ghjv 49 days ago
    thought you were being flip or assuming that, but checked their profile and you are right. I agree that this should be disclosed in their comment.
  - brcmthrowaway 49 days ago
    How do you know that?
    [-]
    - btheunissen 49 days ago
      All of their comments are either a) "this OCR is bad", or b) "here is my team's OCR, it is good". Quite blatant.