good news - we've actually included optimized causal and non-causal versions of ...

pama · on May 14, 2024

Awesome. Do you happen to have a benchmark against the latest (v9.1) cuDNN implementation?

Aaryan44 · on May 14, 2024

@pama, if useful - here are utilization numbers for our attention backwards kernels (causal and non-causal, head dim = 64): https://github.com/HazyResearch/ThunderKittens/blob/main/att...

lucidrains · on May 14, 2024

amazing work! thank you!

Aaryan44 · on May 14, 2024

Thanks @lucidrains :)