Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Did it work? :)

The architecture is very similar offset lstms which have been studied extensively. The main difference is the handover of the hidden state, which my naive mind would assume makes optimization substantially more difficult.



I haven't had a chance to read the preprint carefully or play with the code yet. Best place to follow what's happening is by looking at the github repo, specifically open and closed issues and pull requests.


I'll wait until some more benchmarks are run in this case. Unlike traditional software, vetting a model architecture works better than alternatives is a time and compute intensive process. You really can't just download it and "try it out" outside of general purpose models (which this is not).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: