Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It kind of reminds me of this vanishing gradient problem in ML early on, where really deep layers won't train b/c you get these gradients dying midway, and the solution was to add these bypass connections (resnets style). I wonder if you can have similar solutions. Ofc I think what happens in general is like control theory, like you should be able to detect going off-course with some probability too and correct [longer horizon you have probability of leaving the safe-zone so you still get the exp decay but in larger field]. Not sure how to connect all these ideas though.





Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: