Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Human behaviour in a way far more general than which token we might next type. To mimick humans as closely as might be possible with the basic deep learning method, train something that can predict human behaviour in general. Training would require billions to quadrillions of hours of video and audio and probably many other inputs, from many different people, engaged in the full variety of human activity.


Humans have a brain that physically changes especially in childhood, so that is potentially a massive advantage.


Why? An adult by 25 only has 146k hours of video experience “training,” most of it repeated, derivative, and unproductive. And their encoded genes can be observed in their genome, so don’t need to be retrained by millions of years of evolution.


Much of that time also includes physical interaction with the world, which makes it far more valuable because it can improve performance in a focused way.


How better to learn to do menial physical tasks like house cleaning, and produce picking?


Neural nets seem to learn much slower than humans. Even GPT-2 has seen orders of magnitude more tokens of language than a human experiences in a lifetime. At least as far as language is concerned, humans are able to extract a lot more information from their training data.


Humans are also extensively pretrained by billions of years of evolution, so by starting from scratch GPT is admittedly disadvantaged from the get-go.


But the human genome is only 3 gigabytes and the vast majority of that is unlikely to be encoding brain structure.


"Only" 3 gigabytes.

The lambda calculus (a system we know is capable of infinite self-complexity, learning, etc. with the right program) can be described in a few hundred bits. And a neural net can be described in the lambda calculus in perhaps a few thousand bits.

Also, we have no idea how "compressed" the genome is.


Basic structure can be encoded yes (and obviously is given brains have consistent structure), but the weights or parameters, presuming that brains learn via synaptic weights, obviously do not fit in the genome.

Compression still must obey information theory.


True, but we don’t know how to recreate that kind of pretraining.


One difference is that humans are actively involved in data collection, so when there is a gap in their knowledge, they don’t just wait for the information to show up, they ask a question, etc.


Humans do not train on video, the idea of a video, or even a frame, is a high level abstraction within the human brain.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: