Flash Attention Minimal
A minimal implementation of Flash Attention 1 & 2 in just ~350 lines of CUDA code.
A minimal implementation of Flash Attention 1 & 2 in just ~350 lines of CUDA code.
A C++ implementation of Meta’s Llama2 generative large-language model. I also optimized the original C implementation by Karpathy by adding parallelization on the multi-head attention layer.
A collection of algorithms, data structures and other useful information for competitive programming. Used and maintained by members of the Ateneo de Manila University Programming Varsity.