Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is for LLM's which deal mainly with text. An entire book can be stored as .42 MB according to https://www.quora.com/How-many-megabytes-are-in-a-book.

424 terrabytes text is over a billion books worth of data. On the common crawl website it even says "Over 250 billion pages spanning 17 years." That's an impressive amount of information.



LLMs can deal with more than text. Impressive today is nothing tomorrow


The technology that allows an LLM to "see" images and video is completely different though. It's not what is being trained on common crawl.


not really. embeddings are embeddings. check out llava




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: