Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

good news - we've actually included optimized causal and non-causal versions of the flash attention backwards pass with TK - would love for you to check them out!

causal: https://github.com/HazyResearch/ThunderKittens/blob/main/exa...

non-causal: https://github.com/HazyResearch/ThunderKittens/blob/main/exa...



Awesome. Do you happen to have a benchmark against the latest (v9.1) cuDNN implementation?


@pama, if useful - here are utilization numbers for our attention backwards kernels (causal and non-causal, head dim = 64): https://github.com/HazyResearch/ThunderKittens/blob/main/att...


amazing work! thank you!


Thanks @lucidrains :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: