Hacker Newsnew | past | comments | ask | show | jobs | submit | breadislove's commentslogin

this might be interesting: https://www.theinformation.com/articles/chatgpt-doctors-star...

> $150M RR on just ads, +3x from August. On <1M users.

source: https://x.com/ArfurRock/status/1999618200024076620


Whoa. $150M ARR on ads is a wild stat.

Thanks for sharing that source. It really validates the thesis that unless the user pays (SaaS), the Pharma companies are the real customers.


You built a cool product. I'm actually one of the founders of https://medisearch.io which is similar to what you are building. I think the long-tail problem that you describe can be solved in other ways than with live APIs and you may find other problems with using live APIs.


Thanks! I just took a look at MediSearch. It looks really clean.

You are definitely right that Live APIs come with their own headaches (mostly latency and rate limits).

For now, I chose this path to avoid the infrastructure overhead of maintaining a massive fresh index as a solo dev. However, I suspect that as usage grows, I will have to move toward a hybrid model where I cache or index the 'head' of the query distribution to improve performance.

Always great to meet others tackling this space. I’d love to swap notes sometime if you are open to it.


a good system (like openevidence) indexes every paper released and semantic search can incredible helpful since the the search api of all those providers are extremely limited in terms of quality.

now you get why those system are not cheap. keeping indexes fresh, maintaining high quality at large scale and being extremely precise is challenging. by having distributed indexes you are at the mercy of the api providers and i can tell you from previous experience that it won't be 'currently accurate'.

for transparency: i am building a search api, so i am biased. but i also build medical retrieval systems for some time.


Appreciate the transparency and the insight from a fellow builder.

You are spot on that maintaining a fresh, high-quality index at scale is the 'hard problem' (and why tools like OpenEvidence are expensive).

However, I found that for clinical queries, Vector/Semantic Search often suffers from 'Semantic Drift'—fuzzily matching concepts that sound similar but are medically distinct.

My architectural bet is on Hybrid RAG:

Trust the MeSH: I rely on PubMed's strict Boolean/MeSH search for the retrieval because for specific drug names or gene variants, exact keyword matching beats vector cosine similarity.

LLM as the Reranker: Since API search relevance can indeed be noisy, I fetch a wider net (top ~30-50 abstracts) and use the LLM's context window to 'rerank' and filter them before synthesis.

It's definitely a trade-off (latency vs. index freshness), but for a bootstrapped tool, leveraging the NLM's billions of dollars in indexing infrastructure feels like the right lever to pull vs. trying to out-index them.


you should check mixedbread out. we support indexing multimodal data and making data ready for ai. we are adding video and audio support by the end of the year. might be interesting for the OP as well.

we have couple investigative journalists and lawyers using us for a similar usecase.


curious how does this compare to something like Memories.ai when it comes to video in particular?


or deberta but nevertheless super interesting!


For everyone wondering how good this and other benchmarks are:

- the OmniAI benchmark is bad

- Instead check OmniDocBench[1] out

- Mistral OCR is far far behind most Open Source OCR models and even further behind then Gemini

- End to End OCR is still extremely tricky

- composed pipelines work better (layout detection -> reading order -> OCR every element)

- complex table parsing is still extremely difficult

[1]: https://github.com/opendatalab/OmniDocBench


Wish someone benchmarked Apple Vision Framework against these others. It's built into most Apple devices, but people don't know you can actually harness it to do fast, good quality OCR for you (and go a few extra steps to produce searchable pdfs, which is my typical use case). I'm very curious where it would fall in the benchmarks.


It is unusable trash for languages with any vertical writing such as Japanese. It simply doesn’t work.


Yeah, and fails quickly at anything handwritten.


I mostly OCR English, so Japanese (as mentioned by parent) wouldn't be an issue for me, but I do care about handwriting. See, these insights are super helpful. If only there was, say, a benchmark to show these.

My main question really is: what are practical OCR tools that I can string together on my MacBook Pro M1 Max w/ 64GB Ram to maximize OCR quality for lots of mail and schoolwork coming into my house, all mostly in English.

I use ScanSnap Manager with its built in OCR tools, but that's probably super outdated by now. Apple Vision does way better job than that. I heard people say also that Apple Vision is better than Tesseract. But is there something better still that's also practical to run in a scripted environment on my machine?


LiveText too? It has a newer engine


This is the second comment of yours about LiveText (this is the older one https://news.ycombinator.com/item?id=43192141) — I found that one by complete coincidence because I'm trying to provide a Ruby API for these frameworks. However, I can't find much info on LiveText? What framework is it part of? Do you have any links or any additional info? I found a source where they say it's specifically for screen and camera capturing.


https://developer.apple.com/documentation/visionkit/imageana... VisionKit. Swift-only (as with many new APIs) so lots of people stuck on ObjC bridges simply ignore it.

It does not provide bounding boxes but you can get text.


That's great, I'm going to give this a shot. If you have any more resources please do share. I don't mind Swift-only, because I'm writing little shims with `@_cdecl` for the bridge (don't have much experience here, but hoping this is going to work, leaning on AI for support).


Interesting. How do you harness it for that purpose? I've found apple ocr to be very good.


The short answer is a tool like OwlOCR (which also has CLI support). The long answer is that there are tools on github (I created the stars list: https://github.com/stars/maxim/lists/apple-vision-framework/) that try to use the framework for various things. I’m also trying to build an ffi-based Ruby gem that provides convenient access in Ruby to the framework’s functionality.


Apple shortcuts allows you to use OCR on images you pass into it. Looking for “ Extract Text from Image”


Yeah, if it was cross-platform maybe more people would be curious about it, but something that can only run on ~10% of the hardware people have doesn't make it very attractive to even begin to spend time on Apple-exclusive stuff.


But you can have an apple device deployed in your stack to handle the OCR, right? I get on-device is a hardware limitation for many, but if you have an apple device in your stack, can’t you leverage this?


Yeah, but handling macOS is a infrastructure-capacity sucks, Apple really doesn't want you to so tooling is almost none existing. I've setup CI/CD stacks before that needed macOS builders and it's always the most cumbersome machines to manage as infrastructure.


AWS literally lets you deploy Macs as EC2 instances, which I believe includes all of AWS's usual EBS storage and disk imaging features.


Alright, so now the easy thing is done, now how do you actually manage them, keep them running and do introspection without resorting to SSH or even remote desktop?


How do you manage any EC2 instance “without resorting to SSH”? Even for Linux EC2 instances, the right answer is often tools like Ansible, which do still use SSH under the hood.


You usually provision them via images, that they then either install from or boot from directly. Not to mention there are countless of infrastructure software to run that works for at least Linux, sometimes Windows and seldom even macOS.


I specifically mentioned the imaging capability of EBS for Mac, which you dismissed as the easy part. Now you’re claiming that is the main thing? Well, good news!

And yes, Ansible (among other tools) can be used to manage macOS.

This discussion doesn’t seem productive. You have a preconceived view point, and you’re not actually considering the problem or even doing 5 seconds of googling.

Managing a Mac fleet on AWS isn’t a real problem. If Apple’s OCR framework were significantly above the competition, it could easily be used. I would like to see benchmarks of it, as the other person was also asking for.


10% of hardware is an insanely vast amount, no?


Well, it's 90% less than what everyone else uses, so even if the total number is big, relatively it has a small user-base.


I don’t think 10% of anything would be considered relatively small even if we talk about 10 items: literally there’s only 10 items and this 1 has the rare quality of being among 10. Let alone billions of devices. Unless you want to reduce it to tautology, and instead of answering “why it’s not benchmarked” just go for “10 is smaller than 90, so I’m right”.

My point is, I don’t think any comparative benchmark would ever exclude something based on “oh it’s just 10%, who cares.” I think the issue is more that Apple Vision Framework is not well known as an OCR option, but maybe it’s starting to change.

And another part of the irony is that Apple’s framework probably gets way more real world usage in practice than most of the tools in that benchmark.


The initial wish was that more people cared about Apple Vision Framework, I'm merely claiming that since most people don't actually have Apple hardware, they're avoiding Apple technology as it commonly only runs on Apple hardware.

So I'm not saying it should be excluded because it's can only used by relatively few people, but I was trying to communicate that I kind of get why not so many people care about it and why it gets forgotten, since most people wouldn't be able to run it even if they wanted to.

Instead, something like DeepSeek OCR could be deployed on any of the three major OSes (assuming there is implementations of the architecture available), so of course it gets a lot more attention and will be included in way more benchmarks.


I get what you're saying, I'm just disagreeing with your thought process. By that logic benchmarks would also not include the LLMs that they did, since most people wouldn't be able to run those either (it takes expensive hardware). In fact, more people would probably be able to run Vision framework than those LLMs, for cheaper (Vision is even on iPhones). I'm more inclined to agree if you say "maybe people just don't like Apple". :)


> the OmniAI benchmark is bad

According to Omni OCR benchmark, Omni OCR is the best OCR. I am sure you all will find no issues with these findings.


Hetzner is really great until you try to scale with them. We started building our service on top of Hetzner and had couple 100s of VMs running and during peak time we had to scale them to over 1000 VMs. And here couple of problems started, you get pretty often IPs which are black listed, so if you try to connect to services hosted by Google, AWS like S3 etc. you can't reach them. Also at one point there were no VMs available anymore in our region, which caused a lot of issues.

But in general if you don't need to scale crazy Hetzner is amazing, we still have a lot of stuff running on Hetzner but fan out to other services when we need to scale.


> Also at one point there were no VMs available anymore in our region, which caused a lot of issues.

I'm not sure if this is a difference between other clouds, at least a few years ago this was a weekly or even daily problem in GCP; my experience is if you request hundreds of VMs rapidly during peak hours, all the clouds struggle.


Right now, we can’t request even a single (1) non-beefy non-GPU VM in us-east on Azure. That’s been going on for over a month now, and that’s after being a customer for 2 years :(


We launch 30k+ VMs a day on GCP, regularly launching hundreds at a time when scheduled jobs are running. That’s one of the most stable aspects of our operation - in the last 5 years I’ve never seen GCP “struggle” with that except during major outages.

At the scale of providers like AWS and even the smaller GCP, “hundreds of VMs” is not a large amount.


If you’re deploying something like 100 m5.4xlarge in us-east-1, sure, AWS’s capacity seems infinite. Once you get into high memory instances, GPU instances, less popular regions etc, it drops off.

Now maybe after the AI demand and waves of purchases of systems appropriate for that things have improved, but it definitely wasn’t the case at the large scale employer I worked at in 2023 (my current employer is much smaller, so doesn’t have those needs, so I can’t comment)


Not a single VM possible to request on Azure us-east for over a month now though :-(


I was talking about real cloud providers :P


I don't use Azure much anymore, but I used to see this problem regularly on Azure too, especially in the more "niche" regions like UK South.


Note that we might be talking about two different things here: some of us use physical servers from Hetzner, which are crazy fast, and a great value. And some of us prefer virtual servers, which (IMHO) are not that revolutionary, even though still much less expensive than the competition.


The blocking of services on Hetzner and Scaleway by Microsoft is well known -

https://www.linkedin.com/posts/jeroen-jacobs-8209391_somethi...

I didn't know AWS and GCP also did it. Not surprised.

The problem is that European regulators do nothing about such anti-competitive dirty tricks. The big clouds hide behind "lots of spam coming from them", which is not true.


First comment on that post claims that according to Mimecast, 37% of EU-based spam originates from Hetzner and Digital Ocean. People have been asking for 3 days for a link to the source (I can't find it either).

On the other hand, someone linked a report from last year[0]:

> 72% of BEC attacks in Q2 2024 used free webmail domains; within those, 72.4% used Gmail. Roughly ~52% of all BEC messages were sent from Gmail accounts that quarter.

[0] https://docs.apwg.org/reports/apwg_trends_report_q2_2024.pdf


Worth noting that this seems to be about Hetzners cloud product, not the dedicated servers. The cloud product is relatively new, and most of the people who move to Hetzner do so because of the dedicated instances, not to use their cloud.


Hetzner's cloud offering is probably a decade old by now - I've been a very happy customer for 8 years.


Hetzner was founded in '97, so cloud offering could technically still be considered relatively new. :D


You're right! I seem to have mixed it with some other dedi provider that added "cloud" recently. Thanks for the correction!

My point of people moving to Hetzner for the dedicated instances rather than the cloud still remains though, at least in my bubble.


No problem (I'm just glad you didn't read it as snark)! I mean, even 8 years is relatively new compared to their dedicated box offering so technically you were still correct.


1000 VMs?

So you have approx 1MM concurrent customers? That's a big number. You should definitely be able to get preferred pricing from AWS at that scale.


We have extremely processing heavy jobs where user upload large collection of files (PDFs, audios, videos etc.) and expect to get fast processing.


I've ran into the IP deny list problem too, but for Windows VMs - you spin them up, only to realise that you can't get Windows Updates, can't reach the Powershell gallery etc.

And just deleting it and starting again is just going to give you the exact same IP again!

I ended up having to buy a dozen or so IPs until I found one that wasn't blocked, and then I could delete all the blocked ones.


This sound really intriguing, and I am really curious. What kind of service do you run where you need a 100s of VMs? Was there a reason for not going dedicated? Looking at their offering their biggest VM is (48 CPU, 192 GB RAM, 960 GB SSD). I can't even imagine using that much. Again, I'm really curious.


we have extremely processing heavy jobs where user upload large collection of files (audios, pdfs, videos etc.) and expect to get fast processing. its just that we need to fan out sometimes, since a lot of our users a sensitive to processing times.


I think they're great but it's unfortunate they don't have more locations which would at least enable you to spin VMs up in different locations during a shortage. If you rely on them it might be wise to have a second cloud provider that you can use in a pinch, there's many options.


We scaled to ~1100 bare metal servers with them and it worked perfectly.


Username checks out.


Blacklisted by whom?


AWS at least maintains IP lists of bots, active exploiters, ddos attackers, etc, that you can use to filter/rate limit traffic in WAF. Not so much AWS that blocks you but customers that decide to use these lists.


Ironic, given how often the attacks I spend time fending off are coming from AWS.


So AWS could list some IPs of competitors, just enough to make them look unreliable.


Nice IPs you've got there Hetzner, shame if...


The big cloud providers I am assuming


guys please release the benchmark or the benchmark code. like this is just "trust me bro"


well thats what the playground is for! playground.cognition.ai


This link redirects to https://cognition.ai/blog/swe-grep now?


got a lot of traffic and was taken down temporarily for a couple reasons - team got it online again last night


yeah but if people would like to double check the results it would be nice to have the actual benchmark. especially given that your playground is broken...

"We ran into an error processing your request. Please try again"


I think we are going through the langchain era for agents. the world will look really different in couple month and the stack will be wildly different and more unified.


Yeah I think so. With LangChain, LangFlow took off because it was the "no-code" n8n style version that was layered on top of LangChain. To me it was always frustrating that it wasn't one ecosystem // fully interoperable. We're looking to make sure there's a good solution that works in either modality for agents.


Thats actually not correct. Embeddings can handle relationships like “without” or “not.” when trained for it. You need to scale up the training massively to make it generalize it well. The current version of Mixedbread Search supports negatives like "tshirt without stripes". You can check it out on our launch video [1]. We are working on a way more generalized model, which should be able to capture relationships, emotions and much more. The current models are just limited.

[1]: https://www.mixedbread.com/blog/mixedbread-search


I was referring specifically to popular embedding models like OpenAI’s and sentence-transformers, which (as far as I know) don’t reliably handle negation or emotional nuance, they mostly capture topical similarity.

I don’t know enough of the underlying math to say for sure whether embeddings can be trained to consistently represent negation, but when I tried the Mixedbread demo myself with a query like “winter landscapes without sun and trees”, it still showed me paintings with both sun and trees. So at least in its current form, it doesn’t seem to fully handle those semantic relationships yet.


yes, in which one would be interested?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: