More

Do you mean it's faster when the embeddings are pre-computed or is it faster when the embeddings are computed on the fly as well. Also, what's the recommended way to store the colbert embeddings as, because of the 2d nature of the embeddings it's not practical to store in a vector database.

raphaelty · on Nov 18, 2023

Yes, ColBERT is fast because you can pre-compute most embeddings. It's important to compute documents embeddings only once. neural-cherche do not compute embeddings on the fly and the retrieve method ask for queries and documents embeddings rather than queries and documents texts.

Documents and queries embeddings can be obtained using .encode_documents and .encode_queries methods

I save most of my embeddings (python dictionnary with documents id as key and embeddings as values) using joblib in a Bucket in the cloud. I don't really know if it's a good pratice but it does scale fine to few millions documents for offline (no real-time) applications.

aashu_dwivedi · on Dec 24, 2020

What are type A and Type B?

mumblemumble · on Dec 24, 2020

https://medium.com/@jamesdensmore/there-are-two-types-of-dat...

aashu_dwivedi · on Sept 2, 2020

I keep my emacs org files in a repo. Which is pretty much the same thing.

aashu_dwivedi · on June 19, 2020

I didn't remember them by name but after reading the wikipedia page, I remember they are the company which acquired Drangon systems in liu of a large number of stocks.

weinzierl · on June 19, 2020

If I remember the story correctly the founders of Dragon Systems sued the bank that handled the acquisition. They claimed the bank convinced them to take stock and not cash at a point in time when the same bank already had been preparing for the inevitable bankruptcy of L&H. They lost because M&A was considered a different branch of the bank and not supposed to know about the imminent bankruptcy.

BTW: The founders of Dragon Systems are a married couple with an interesting bio too.

aashu_dwivedi · on June 1, 2020

You should definitely consider it, I am a graduate from the same program :)

aashu_dwivedi · on April 6, 2020

The assignments in the old course are in matlab. There are also unofficial assignments available in python. They can also be submitted for grades in coursera. You might want to have a look at them.

tomduncalf · on April 6, 2020

Good tip, thank you!

aashu_dwivedi · on April 6, 2020

How are you going about it?