Flash Hyperbolic Attention Minimal [WIP]

A minimal re-implementation of Flash Attention with CUDA and PyTorch. The official implementation can be quite daunting for a CUDA beginner (like myself), so this repo tries to be small and educational.

The end goal of this repo is to implement Flash Attention-like kernels for the various hyperbolic attention algorithms, finally making them production-ready.
This was forked from Peter Kim’s flash-attention-minimal repo.
The variable names follow the notations from the original paper.