Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: An ML-powered Search Engine for Quanta Magazine Articles (quantasearch.club)
8 points by svcrunch on April 13, 2021 | hide | past | favorite | 7 comments


Hello Hacker News. Quanta Magazine is one of my favorite sources for stories about mathematics and computer science. I created a search engine that indexes articles through February 2021, and I hope you like it.

I’ve been in software engineering since the late 1990’s. In 2012, I joined Ray Kurzweil’s NLP research group at Google, and stayed there until 2020. From 2017 forward, I led research into general-purpose, scalable question answering systems using neural networks. I focused on the full spectrum of system design, from designing neural network loss functions, to exploring trade offs in production architectures. This is still a nascent field, and is sometimes referred to as “neural information retrieval” or “semantic search”.

I left Google in 2020 to found ZIR AI. Our vision is to provide a cloud platform where organizations can upload their data and run text search, similar to services like Elastic or Algolia. Unlike those services, we leverage neural networks and deep learning extensively in our architecture.

I built quantasearch.club as a way to showcase the sorts of search experiences you can build easily on our platform. The entire app took about three days to develop, using the ZIR Platform to run the actual searches.

Please take a look. I would welcome your feedback.


Very cool, congrats on launching Zir AI!

This topic is near and dear my heart. Semantic search is amazing when it works but notoriously hard to say when it works.

Can you provide more details aboout how does the demo work?

It seems it is using sentence level embeddings from the text, and does not use title?

Few search queries I tried gave better results than google with site: param (for example "good science podcast", "next gen batteries").

Few quirks I encountered:

Dealing with names and numbers:

https://quantasearch.club/#2010

https://quantasearch.club/#ray%20Kurzweil

Will it work for "normal" search queries in the future or your focus is just question-type queries?

https://quantasearch.club/#academy

Another problematic set of results (poor recall)

https://quantasearch.club/#best%20laptops

Thanks for sharing and good luck!


Thanks so much for the detailed feedback. I agree with your high level conclusions: semantic search is an open research area, and there are definitely cases keyword search still works better. But even in it's current form, I think it has some commercial applications.

We are working on improving the retrieval algorithms, so hopefully in the future the platform will not fail on some of the examples you provided.

> It seems it is using sentence level embeddings from the text, and does not use title?

The platform actually does use the document title when matching results, as well as selecting a relevant snippet (often sentence-length, but doesn't have to be), if it finds one. However, the snippet might not be good enough to display. For example, if you look at the second search result for the query: "most difficult proofs"

https://quantasearch.club/#most%20difficult%20proofs

"Pentagon Tiling Proof Solves Century-Old Math Problem"

The document matched, but the snippet wasn't good enough to show, so the introductory sentence of the article is shown instead.

For this particular demo, though, I wanted more fine-grained control of how the title was weighed against the body of the document, so I issued two separate queries to the platform and merged the results. I hope to release that code as part of an upcoming tutorial.

> Can you provide more details about how does the demo work?

Yes, there were a few steps to building it:

1. I gathered the Quanta articles.

2. I converted them into JSON formatted documents. Alternatively, PDF, markdown, and protocol buffers are supported.

3. Within my account on the ZIR platform, I created a corpus to hold the articles.

4. Uploaded the documents (through either the API or drag-and-drop web interface).

The actual demo server is coded in Java.

5. It receives the query from the user, and passes it directly to the platform for searching (Here we actually issue 2 queries, but usually, a single query is sufficient. Can discuss details if you'd like).

6. Platform returns matching document ids and snippets, which are joined to the document body and converted to JSON, which is sent to the browser for rendering.

Hopefully this clarifies the architecture a bit.


I asked "what is the biggest gap between primes?" and I was delighted by the returned results.

[https://quantasearch.club/#what%20is%20the%20biggest%20gap%2...?]


:) Thanks for trying it. Prime numbers are an area of interest for me, too.


Thanks. I tried "why is the universe growing" on both your search engine and Quanta's main page search. I got better/more relevant results from your search engine. Good work.


Thank you for trying it out!




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: