- 64GB RAM - FLOAT16 vectors (half the usual size) - Binary quantization + oversampling - 2-stage search (prefetch in RAM, rescore on disk) - Async disk IO (via io_uring)
We hit ~54GB RAM usage and achieved sub-second query times.
We documented everything, including memory breakdowns, hardware configs, upload scripts, and tuning tips.
- 64GB RAM - FLOAT16 vectors (half the usual size) - Binary quantization + oversampling - 2-stage search (prefetch in RAM, rescore on disk) - Async disk IO (via io_uring)
We hit ~54GB RAM usage and achieved sub-second query times.
We documented everything, including memory breakdowns, hardware configs, upload scripts, and tuning tips.