Flash Attention Minimal

A minimal implementation of Flash Attention 1 & 2 in just ~350 lines of CUDA code.

April 16, 2024 · 1 min · Franz Louis Cesista

Llama.cpp

A C++ implementation of Meta’s Llama2 generative large-language model. I also optimized the original C implementation by Karpathy by adding parallelization on the multi-head attention layer.

July 25, 2023 · 1 min · Franz Louis Cesista

Ateneo's Competitive Programming Varsity's Code Library

A collection of algorithms, data structures and other useful information for competitive programming. Used and maintained by members of the Ateneo de Manila University Programming Varsity.

1 min · Franz Louis Cesista