These vectors are lower-dimensional than traditional vectors though, aren't they... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

WinLychee on Sept 5, 2023 | parent | context | favorite | on: Do we think about vector storage wrong?

These vectors are lower-dimensional than traditional vectors though, aren't they? Vector embeddings are in the hundreds to low thousands range of dimensions (roughly between 128-1024), whereas TF-IDF has the same dimension as your vocabulary. It's also not just about being flat-out better, but about increasing the recall of queries, as you're grabbing content that doesn't contain the keywords directly, but is still relevant. You are also free to mix the two approaches together in one result set, which gives the best of both.

pradn on Sept 5, 2023 | [–]

The problems with dimensionality certainly show up even with 256 dimensions. PCA-ing down to a few hundred dimensions is still a problem, and then you have to deal with PCA lossiness too!

mk67 on Sept 5, 2023 | [–]

Nobody used TF-IDF for vector lookups without applying a PCA first though.

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact