Hacker News new | past | comments | ask | show | jobs | submit login

It would be great to semantically search through literature with embeddings. At least one person I know if is trying to generate a vector database of all arxiv papers.

The big problem I see is attribution and citations. An embedding is just a vector. It doesn't contain any citation back to the source material or modification date or certificate of authenticity. So when using embeddings in RAG, they only serve to link back to a particular page of source material.

Using embeddings as links doesn't dramatically change the way citation and attribution are handled in technical writing. You still end up citing a whole paper or a page of a paper.

I think GraphRAG [1] is a more useful thing to build on for technical literature. There's ways to use graphs to cite a particular concept of a particular page of an academic paper. And for the 'citations' to act as bidirectional links between new and old scientific discourse. But I digress

[1] https://microsoft.github.io/graphrag/




IMO, for technical writing, citing a page or section within a page is usually good enough. I rarely need to cite a particular concept. But I've never even thought of the possibility of more granular concept-level citations and will definitely be pondering it more!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: