Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They didn't remove the hidden state entirely, they just removed it from the input, forget and update gates. I haven't digested the paper either, but I think that in the case of a GRU this means that the hidden state update masking (z_t and r_t in the paper's formulas) only depends on the new input, not the input plus the prior hidden state.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: