Flash Attention Minimal

A minimal implementation of Flash Attention 1 & 2 in just ~350 lines of CUDA code.

April 16, 2024 · 1 min · Franz Louis Cesista