Flash Attention Minimal

Franz Louis Cesista

Flash Attention Minimal

April 16, 2024 · 1 min · Franz Louis Cesista · Github Repository

Table of Contents

Repo: https://github.com/leloykun/flash-attention-minimal

Summary

A minimal re-implementation of Flash Attention with CUDA and PyTorch. The official implementation can be quite daunting for a CUDA beginner (like myself), so this repo tries to be small and educational.

The end goal of this repo is to implement Flash Attention-like kernels for the various attention algorithms, finally making them production-ready.
This was forked from Peter Kim’s flash-attention-minimal repo.
The variable names follow the notations from the original paper.

How to cite

@misc{cesista2024flashattentionminimal,
  author = {Franz Louis Cesista},
  title = {Flash Attention Minimal},
  year = {2024},
  url = {https://github.com/leloykun/flash-attention-minimal/},
}

Summary#

How to cite#

Summary

How to cite