You built a cool product. I'm actually one of the founders of https://medisearch.io which is similar to what you are building. I think the long-tail problem that you describe can be solved in other ways than with live APIs and you may find other problems with using live APIs.
Thanks! I just took a look at MediSearch. It looks really clean.
You are definitely right that Live APIs come with their own headaches (mostly latency and rate limits).
For now, I chose this path to avoid the infrastructure overhead of maintaining a massive fresh index as a solo dev. However, I suspect that as usage grows, I will have to move toward a hybrid model where I cache or index the 'head' of the query distribution to improve performance.
Always great to meet others tackling this space. I’d love to swap notes sometime if you are open to it.
a good system (like openevidence) indexes every paper released and semantic search can incredible helpful since the the search api of all those providers are extremely limited in terms of quality.
now you get why those system are not cheap. keeping indexes fresh, maintaining high quality at large scale and being extremely precise is challenging. by having distributed indexes you are at the mercy of the api providers and i can tell you from previous experience that it won't be 'currently accurate'.
for transparency: i am building a search api, so i am biased. but i also build medical retrieval systems for some time.
Appreciate the transparency and the insight from a fellow builder.
You are spot on that maintaining a fresh, high-quality index at scale is the 'hard problem' (and why tools like OpenEvidence are expensive).
However, I found that for clinical queries, Vector/Semantic Search often suffers from 'Semantic Drift'—fuzzily matching concepts that sound similar but are medically distinct.
My architectural bet is on Hybrid RAG:
Trust the MeSH: I rely on PubMed's strict Boolean/MeSH search for the retrieval because for specific drug names or gene variants, exact keyword matching beats vector cosine similarity.
LLM as the Reranker: Since API search relevance can indeed be noisy, I fetch a wider net (top ~30-50 abstracts) and use the LLM's context window to 'rerank' and filter them before synthesis.
It's definitely a trade-off (latency vs. index freshness), but for a bootstrapped tool, leveraging the NLM's billions of dollars in indexing infrastructure feels like the right lever to pull vs. trying to out-index them.
you should check mixedbread out. we support indexing multimodal data and making data ready for ai. we are adding video and audio support by the end of the year. might be interesting for the OP as well.
we have couple investigative journalists and lawyers using us for a similar usecase.
Wish someone benchmarked Apple Vision Framework against these others. It's built into most Apple devices, but people don't know you can actually harness it to do fast, good quality OCR for you (and go a few extra steps to produce searchable pdfs, which is my typical use case). I'm very curious where it would fall in the benchmarks.
I mostly OCR English, so Japanese (as mentioned by parent) wouldn't be an issue for me, but I do care about handwriting. See, these insights are super helpful. If only there was, say, a benchmark to show these.
My main question really is: what are practical OCR tools that I can string together on my MacBook Pro M1 Max w/ 64GB Ram to maximize OCR quality for lots of mail and schoolwork coming into my house, all mostly in English.
I use ScanSnap Manager with its built in OCR tools, but that's probably super outdated by now. Apple Vision does way better job than that. I heard people say also that Apple Vision is better than Tesseract. But is there something better still that's also practical to run in a scripted environment on my machine?
This is the second comment of yours about LiveText (this is the older one https://news.ycombinator.com/item?id=43192141) — I found that one by complete coincidence because I'm trying to provide a Ruby API for these frameworks. However, I can't find much info on LiveText? What framework is it part of? Do you have any links or any additional info? I found a source where they say it's specifically for screen and camera capturing.
That's great, I'm going to give this a shot. If you have any more resources please do share. I don't mind Swift-only, because I'm writing little shims with `@_cdecl` for the bridge (don't have much experience here, but hoping this is going to work, leaning on AI for support).
The short answer is a tool like OwlOCR (which also has CLI support). The long answer is that there are tools on github (I created the stars list: https://github.com/stars/maxim/lists/apple-vision-framework/) that try to use the framework for various things. I’m also trying to build an ffi-based Ruby gem that provides convenient access in Ruby to the framework’s functionality.
Yeah, if it was cross-platform maybe more people would be curious about it, but something that can only run on ~10% of the hardware people have doesn't make it very attractive to even begin to spend time on Apple-exclusive stuff.
But you can have an apple device deployed in your stack to handle the OCR, right? I get on-device is a hardware limitation for many, but if you have an apple device in your stack, can’t you leverage this?
Yeah, but handling macOS is a infrastructure-capacity sucks, Apple really doesn't want you to so tooling is almost none existing. I've setup CI/CD stacks before that needed macOS builders and it's always the most cumbersome machines to manage as infrastructure.
Alright, so now the easy thing is done, now how do you actually manage them, keep them running and do introspection without resorting to SSH or even remote desktop?
How do you manage any EC2 instance “without resorting to SSH”? Even for Linux EC2 instances, the right answer is often tools like Ansible, which do still use SSH under the hood.
You usually provision them via images, that they then either install from or boot from directly. Not to mention there are countless of infrastructure software to run that works for at least Linux, sometimes Windows and seldom even macOS.
I specifically mentioned the imaging capability of EBS for Mac, which you dismissed as the easy part. Now you’re claiming that is the main thing? Well, good news!
And yes, Ansible (among other tools) can be used to manage macOS.
This discussion doesn’t seem productive. You have a preconceived view point, and you’re not actually considering the problem or even doing 5 seconds of googling.
Managing a Mac fleet on AWS isn’t a real problem. If Apple’s OCR framework were significantly above the competition, it could easily be used. I would like to see benchmarks of it, as the other person was also asking for.
I don’t think 10% of anything would be considered relatively small even if we talk about 10 items: literally there’s only 10 items and this 1 has the rare quality of being among 10. Let alone billions of devices. Unless you want to reduce it to tautology, and instead of answering “why it’s not benchmarked” just go for “10 is smaller than 90, so I’m right”.
My point is, I don’t think any comparative benchmark would ever exclude something based on “oh it’s just 10%, who cares.” I think the issue is more that Apple Vision Framework is not well known as an OCR option, but maybe it’s starting to change.
And another part of the irony is that Apple’s framework probably gets way more real world usage in practice than most of the tools in that benchmark.
The initial wish was that more people cared about Apple Vision Framework, I'm merely claiming that since most people don't actually have Apple hardware, they're avoiding Apple technology as it commonly only runs on Apple hardware.
So I'm not saying it should be excluded because it's can only used by relatively few people, but I was trying to communicate that I kind of get why not so many people care about it and why it gets forgotten, since most people wouldn't be able to run it even if they wanted to.
Instead, something like DeepSeek OCR could be deployed on any of the three major OSes (assuming there is implementations of the architecture available), so of course it gets a lot more attention and will be included in way more benchmarks.
I get what you're saying, I'm just disagreeing with your thought process. By that logic benchmarks would also not include the LLMs that they did, since most people wouldn't be able to run those either (it takes expensive hardware). In fact, more people would probably be able to run Vision framework than those LLMs, for cheaper (Vision is even on iPhones). I'm more inclined to agree if you say "maybe people just don't like Apple". :)
Hetzner is really great until you try to scale with them. We started building our service on top of Hetzner and had couple 100s of VMs running and during peak time we had to scale them to over 1000 VMs. And here couple of problems started, you get pretty often IPs which are black listed, so if you try to connect to services hosted by Google, AWS like S3 etc. you can't reach them. Also at one point there were no VMs available anymore in our region, which caused a lot of issues.
But in general if you don't need to scale crazy Hetzner is amazing, we still have a lot of stuff running on Hetzner but fan out to other services when we need to scale.
> Also at one point there were no VMs available anymore in our region, which caused a lot of issues.
I'm not sure if this is a difference between other clouds, at least a few years ago this was a weekly or even daily problem in GCP; my experience is if you request hundreds of VMs rapidly during peak hours, all the clouds struggle.
Right now, we can’t request even a single (1) non-beefy non-GPU VM in us-east on Azure. That’s been going on for over a month now, and that’s after being a customer for 2 years :(
We launch 30k+ VMs a day on GCP, regularly launching hundreds at a time when scheduled jobs are running. That’s one of the most stable aspects of our operation - in the last 5 years I’ve never seen GCP “struggle” with that except during major outages.
At the scale of providers like AWS and even the smaller GCP, “hundreds of VMs” is not a large amount.
If you’re deploying something like 100 m5.4xlarge in us-east-1, sure, AWS’s capacity seems infinite. Once you get into high memory instances, GPU instances, less popular regions etc, it drops off.
Now maybe after the AI demand and waves of purchases of systems appropriate for that things have improved, but it definitely wasn’t the case at the large scale employer I worked at in 2023 (my current employer is much smaller, so doesn’t have those needs, so I can’t comment)
Note that we might be talking about two different things here: some of us use physical servers from Hetzner, which are crazy fast, and a great value. And some of us prefer virtual servers, which (IMHO) are not that revolutionary, even though still much less expensive than the competition.
I didn't know AWS and GCP also did it. Not surprised.
The problem is that European regulators do nothing about such anti-competitive dirty tricks. The big clouds hide behind "lots of spam coming from them", which is not true.
First comment on that post claims that according to Mimecast, 37% of EU-based spam originates from Hetzner and Digital Ocean. People have been asking for 3 days for a link to the source (I can't find it either).
On the other hand, someone linked a report from last year[0]:
> 72% of BEC attacks in Q2 2024 used free webmail domains; within those, 72.4% used Gmail. Roughly ~52% of all BEC messages were sent from Gmail accounts that quarter.
Worth noting that this seems to be about Hetzners cloud product, not the dedicated servers. The cloud product is relatively new, and most of the people who move to Hetzner do so because of the dedicated instances, not to use their cloud.
No problem (I'm just glad you didn't read it as snark)! I mean, even 8 years is relatively new compared to their dedicated box offering so technically you were still correct.
I've ran into the IP deny list problem too, but for Windows VMs - you spin them up, only to realise that you can't get Windows Updates, can't reach the Powershell gallery etc.
And just deleting it and starting again is just going to give you the exact same IP again!
I ended up having to buy a dozen or so IPs until I found one that wasn't blocked, and then I could delete all the blocked ones.
This sound really intriguing, and I am really curious. What kind of service do you run where you need a 100s of VMs? Was there a reason for not going dedicated? Looking at their offering their biggest VM is (48 CPU, 192 GB RAM, 960 GB SSD). I can't even imagine using that much. Again, I'm really curious.
we have extremely processing heavy jobs where user upload large collection of files (audios, pdfs, videos etc.) and expect to get fast processing. its just that we need to fan out sometimes, since a lot of our users a sensitive to processing times.
I think they're great but it's unfortunate they don't have more locations which would at least enable you to spin VMs up in different locations during a shortage. If you rely on them it might be wise to have a second cloud provider that you can use in a pinch, there's many options.
AWS at least maintains IP lists of bots, active exploiters, ddos attackers, etc, that you can use to filter/rate limit traffic in WAF. Not so much AWS that blocks you but customers that decide to use these lists.
yeah but if people would like to double check the results it would be nice to have the actual benchmark. especially given that your playground is broken...
"We ran into an error processing your request. Please try again"
I think we are going through the langchain era for agents. the world will look really different in couple month and the stack will be wildly different and more unified.
Yeah I think so. With LangChain, LangFlow took off because it was the "no-code" n8n style version that was layered on top of LangChain. To me it was always frustrating that it wasn't one ecosystem // fully interoperable. We're looking to make sure there's a good solution that works in either modality for agents.
Thats actually not correct. Embeddings can handle relationships like “without” or “not.” when trained for it. You need to scale up the training massively to make it generalize it well. The current version of Mixedbread Search supports negatives like "tshirt without stripes". You can check it out on our launch video [1]. We are working on a way more generalized model, which should be able to capture relationships, emotions and much more. The current models are just limited.
I was referring specifically to popular embedding models like OpenAI’s and sentence-transformers, which (as far as I know) don’t reliably handle negation or emotional nuance, they mostly capture topical similarity.
I don’t know enough of the underlying math to say for sure whether embeddings can be trained to consistently represent negation, but when I tried the Mixedbread demo myself with a query like “winter landscapes without sun and trees”, it still showed me paintings with both sun and trees. So at least in its current form, it doesn’t seem to fully handle those semantic relationships yet.
> $150M RR on just ads, +3x from August. On <1M users.
source: https://x.com/ArfurRock/status/1999618200024076620