Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Most of the weights in llms are 0,

that's interesting. Do you have a rough percentage of this?

Does this mean these connections have no influence at all on output?



My uneducated guess is that with many layers you can implement something akin to graph in brain by nulling lots of previous later outputs. I actually suspect that current models aren’t optimal with layers all of the same size but i know shit


This is quite intuitive. We know that a biological neural net is a graph data structure. And ML systems on GPUs are more like layers of bitmaps in Photoshop (it's a graphics processor). So if most of the layers are akin to transparent pixels, in order to build a graph by stacking, that's hyper memory inefficient.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: