I don't know how complete the digitization of old texts is, but if you go to wor...

I don't know how complete the digitization of old texts is, but if you go to worldwide.espacenet.com, search for "airship" and reverse sort by date you get documents from the 1880s.

In fact I'm downloading a whole batch of patent texts right now because I wanted to experiment with semantic search on patent texts.

Anyone here have any pointers on what the state of the art method for semantic search through a large corpus would be? I've just started researching and BERT and friends seems like it was popular about 2 years ago but things move so fast I wouldn't know what I should do now.

What about a medium sized corpus of text, say 100.000 pages of text?