It kind of reminds me of this vanishing gradient problem in ML early on, where r...

It kind of reminds me of this vanishing gradient problem in ML early on, where really deep layers won't train b/c you get these gradients dying midway, and the solution was to add these bypass connections (resnets style). I wonder if you can have similar solutions. Ofc I think what happens in general is like control theory, like you should be able to detect going off-course with some probability too and correct [longer horizon you have probability of leaving the safe-zone so you still get the exp decay but in larger field]. Not sure how to connect all these ideas though.