Searching 400M image vectors on modest hardware

sabrinaaquino · 2025-03-28T02:20:01 1743128401

We recently ran an experiment to upload and search the LAION-400M dataset (~400M 512d CLIP vectors) using a fairly minimal setup:

- 64GB RAM - FLOAT16 vectors (half the usual size) - Binary quantization + oversampling - 2-stage search (prefetch in RAM, rescore on disk) - Async disk IO (via io_uring)

We hit ~54GB RAM usage and achieved sub-second query times.

We documented everything, including memory breakdowns, hardware configs, upload scripts, and tuning tips.