Hello Hacker News. Quanta Magazine is one of my favorite sources for stories about mathematics and computer science. I created a search engine that indexes articles through February 2021, and I hope you like it.
I’ve been in software engineering since the late 1990’s. In 2012, I joined Ray Kurzweil’s NLP research group at Google, and stayed there until 2020. From 2017 forward, I led research into general-purpose, scalable question answering systems using neural networks. I focused on the full spectrum of system design, from designing neural network loss functions, to exploring trade offs in production architectures. This is still a nascent field, and is sometimes referred to as “neural information retrieval” or “semantic search”.
I left Google in 2020 to found ZIR AI. Our vision is to provide a cloud platform where organizations can upload their data and run text search, similar to services like Elastic or Algolia. Unlike those services, we leverage neural networks and deep learning extensively in our architecture.
I built quantasearch.club as a way to showcase the sorts of search experiences you can build easily on our platform. The entire app took about three days to develop, using the ZIR Platform to run the actual searches.
Please take a look. I would welcome your feedback.
Thanks so much for the detailed feedback. I agree with your high level conclusions: semantic search is an open research area, and there are definitely cases keyword search still works better. But even in it's current form, I think it has some commercial applications.
We are working on improving the retrieval algorithms, so hopefully in the future the platform will not fail on some of the examples you provided.
> It seems it is using sentence level embeddings from the text, and does not use title?
The platform actually does use the document title when matching results, as well as selecting a relevant snippet (often sentence-length, but doesn't have to be), if it finds one. However, the snippet might not be good enough to display. For example, if you look at the second search result for the query: "most difficult proofs"
"Pentagon Tiling Proof Solves Century-Old Math Problem"
The document matched, but the snippet wasn't good enough to show, so the introductory sentence of the article is shown instead.
For this particular demo, though, I wanted more fine-grained control of how the title was weighed against the body of the document, so I issued two separate queries to the platform and merged the results. I hope to release that code as part of an upcoming tutorial.
> Can you provide more details about how does the demo work?
Yes, there were a few steps to building it:
1. I gathered the Quanta articles.
2. I converted them into JSON formatted documents. Alternatively, PDF, markdown, and protocol buffers are supported.
3. Within my account on the ZIR platform, I created a corpus to hold the articles.
4. Uploaded the documents (through either the API or drag-and-drop web interface).
The actual demo server is coded in Java.
5. It receives the query from the user, and passes it directly to the platform for searching (Here we actually issue 2 queries, but usually, a single query is sufficient. Can discuss details if you'd like).
6. Platform returns matching document ids and snippets, which are joined to the document body and converted to JSON, which is sent to the browser for rendering.
Thanks. I tried "why is the universe growing" on both your search engine and Quanta's main page search. I got better/more relevant results from your search engine. Good work.
I’ve been in software engineering since the late 1990’s. In 2012, I joined Ray Kurzweil’s NLP research group at Google, and stayed there until 2020. From 2017 forward, I led research into general-purpose, scalable question answering systems using neural networks. I focused on the full spectrum of system design, from designing neural network loss functions, to exploring trade offs in production architectures. This is still a nascent field, and is sometimes referred to as “neural information retrieval” or “semantic search”.
I left Google in 2020 to found ZIR AI. Our vision is to provide a cloud platform where organizations can upload their data and run text search, similar to services like Elastic or Algolia. Unlike those services, we leverage neural networks and deep learning extensively in our architecture.
I built quantasearch.club as a way to showcase the sorts of search experiences you can build easily on our platform. The entire app took about three days to develop, using the ZIR Platform to run the actual searches.
Please take a look. I would welcome your feedback.