More

noahbp · 2025-06-26T19:26:48 1750966008

“In some regions”

Guatemala is the only place outside the US where I was not quoted 3-5x Uber prices by very pushy taxi drivers.

Taxi drivers are scam artists and thieves. There’s no reputational damage either, as you will never see them again.

Uber solves the reputation problem: every driver is rated, and poorly rated and badly behaved drivers do not get to work for them.

noahbp · 2025-06-15T01:46:28 1749951988

This would change if there wasn’t a culture of giving 5 stars to every driver. It started because Uber unfairly punished good drivers for very good but honest 4/5 reviews, and now every Uber driver who uses their phone while driving or has an interior smelling of cigarette smoke gets 5 stars out of obligation.

noahbp · 2025-05-27T05:32:53 1748323973

> I've tried:

> - Cursor (can't remember which model, the default)

> - Google's Jules

> - OpenAI Codex with o4

Cursor's "default model" rarely works for me. You have to choose one of the models yourself. Sonnet 4, Gemini 2.5 Pro, and for tricky problems, o3.

There is no public release of o4; you used o4-mini, a model with poorer performance than any of the frontier models (Sonnet 4, Gemini Pro 2.5, o3).

Jules and Codex, if they're like Claude Code, do not work well with "Build me a Facebook clone"-type instructions. You have to break everything down and make your own tech stack decisions, even if you use these tools to do so. Yes they are not perfect and make regressions or forget to run linters or check their work with the compiler, but they do work extremely well if you learn to use them, just like any other tool. They are not yet magic that works without you having to put in any effort to learn them.

noahbp · 2025-05-18T18:13:44 1747592024

What is your preferred static text embedding model?

For someone looking to build a large embedding search, fast static embeddings seem like a good deal, but almost too good to be true. What quality tradeoff are you seeing with these models versus embedding models with attention mechanisms?

Tananon · 2025-05-18T18:26:33 1747592793

It depends a bit on the task and language, but my go-to is usually minishlab/potion-base-8M for every task except retrieval (classification, clustering, etc). For retrieval minishlab/potion-retrieval-32M works best. If performance is critical minishlab/potion-base-32M is best, although it's a bit bigger (~100mb).

There's definitely a quality trade-off. We have extensive benchmarks here: https://github.com/MinishLab/model2vec/blob/main/results/REA.... potion-base-32M reaches ~92% of the performance of MiniLM while being much faster (about 70x faster on CPU). It depends a bit on your constraints: if you have limited hardware and very high throughput, these models will allow you to still make decent quality embeddings, but ofcourse an attention based model will be better, but more expensive.

refulgentis · 2025-05-18T22:40:22 1747608022

Thanks man this is incredible work, really appreciate the details you went into.

I've been chewing on if there was a miracle that could make embeddings 10x faster for my search app that uses minilmv3, sounds like there is :) I never would have dreamed. I'll definitely be trying potion-base in my library for Flutter x ONNX.

EDIT: I was thanking you for thorough benchmarking, then it dawned on me you were on the team that built the model - fantastic work, I can't wait to try this. And you already have ONNX!

EDIT2: Craziest demo I've seen in a while. I'm seeing 23x faster, after 10 minutes of work.

Tananon · 2025-05-19T06:19:02 1747635542

Thanks so much for the kind words, that's awesome to hear! If you have any ideas or requests, don't hesitate to reach out!

noahbp · 2025-05-03T17:46:55 1746294415

Onnxruntime supports CoreML, though if my experience with converting an embedding model to CoreML using Apple's CoreML conversion tool is similar to the ORT maintainers', I can see why it would be unmaintained.

It took multiple tries to get the model to convert at all to the mlpackage format, and then a lot of experimenting to get it to run on the ANE instead of the GPU, only to discover that constant reshaping was killing any performance benefit (either you have a fixed multiplication size or don't bother), and even at a fixed size and using the attention mask, its operations were slower than saturating the GPU with large batches.

I discovered an issue where using the newer iOS 18 standard would cause the model conversion to break, and put an issue in on their GitHub, including an example repository for easy replication. I got a response quickly, but almost a year later, the bug is still unfixed.

Even when George Hotz attempted to hack it to use it without Apple's really bad and unmaintained CoreML library, he gave up because it was impossible without breaking some pretty core OS features (certificate signing IIRC).

The ANE/CoreML is just not serious at all about making their hardware usable at all. Even Apple's internal MLX team can't crack that nut.

imtringued · 2025-05-04T08:46:47 1746348407

ONNX is horrible for anything that has variable input shapes and that is why nobody uses it for LLMs. It fundamentally is poorly designed for anything that doesn't take a fixed size image.

zozbot234 · 2025-05-04T10:51:26 1746355886

ANE itself is also limited to fixed computation "shapes" so I'm not sure how much that would matter practically.

noahbp · 2025-04-29T03:58:43 1745899123

I've never heard of this, and I'm pretty sure my coworkers haven't either. Thanks for mentioning it!

https://chatgpt.com/share/68104c37-b578-8003-8c4e-b0a4688206...

noahbp · 2025-04-29T03:44:26 1745898266

This is so good I disabled my ad blocker.

Thank you. Seriously.

Note: I searched "Protein bars", and it treated all protein bars equally. The 1st-20th cheapest had <15g of protein per bar. I had to scroll down to the 50th-60th to find protein bars with 20g of protein, which surprised me for being cheaper than Kirkland Signature's protein bars.

juxtaposicion · 2025-04-29T04:00:26 1745899226

My pleasure! Happy you could use it as much as I do. Anyway we can chat in person? I'd love to make more stuff for you. chris@<our site>.com

noahbp · 2025-04-02T01:14:02 1743556442

Not Dune exactly, but having to run 'eval $(opam env)' in the terminal every time you open an OCaml project rather than the default being npm-like, where you can just open the directory and use the package manager command without having to think about it.

anentropic · 2025-04-02T10:17:36 1743589056

(writing all the below while being aware you likely know much more about OCaml than I do...!)

Possibly `eval $(opam env)` is something that should just go in your ~/.zshrc

The OCaml folks have done some work recently to improve the onboarding documentation, which I think is going in a positive direction

e.g. https://ocaml.org/docs/installing-ocaml (the eval as a one-off post install command)

And then guiding people to use 'switches' https://ocaml.org/docs/opam-switch-introduction, which I totally missed when I started with the language.

> Local switches are automatically selected based on the current working directory.

johnisgood · 2025-04-02T12:31:45 1743597105

The only issues I've had with OCaml's build system is using "ocamlopt", "ocamlbuild", "ocamlfind" manually, but this was solved by OASIS and now Dune. I don't need to think about it. It automatically compiles when I save the file in Emacs. Very easy to set it up (one time setup).

noahbp · 2025-03-10T13:23:20 1741613000

We've progressed to the "more detailed photos and screen grabs on Twitter" stage: https://x.com/bdsqlsz/status/1898307273967145350

noahbp · 2025-02-24T22:01:36 1740434496

Wow! How much did this cost you in GPU credits? And did you consider using your MacBook?

minimaxir · 2025-02-24T22:34:51 1740436491

It took 1:17 to encode all ~32k cards using a preemptible L4 GPU on Google Cloud Platform (g2-standard-4) at ~$0.28/hour, costing < $0.01 overall: https://github.com/minimaxir/mtg-embeddings/blob/main/mtg_em...

The base ModernBERT uses CUDA tricks not available in MPS, so I suspect it would take much longer.

For the 2D UMAP, it took 3:33 because I wanted to do 1 million epochs to be thorough: https://github.com/minimaxir/mtg-embeddings/blob/main/mtg_em...