Flash Hyperbolic Attention Minimal [WIP]

A minimal implementation of Flash Attention 1 & 2 in just ~350 lines of CUDA code. This is still a work-in-progress, but the ultimate goal is to implement the various variations of Hyperbolic Attention in CUDA.

Franz Louis Cesista
Cute Llama

Llama.cpp

A C++ implementation of Meta’s Llama2 generative large-language model. I also optimized the original C implementation by Karpathy by adding parallelization on the multi-head attention layer.

July 25, 2023 · Franz Louis Cesista
1st Page of Progvar's Team Notebook

Ateneo's Competitive Programming Varsity's Code Library

A collection of algorithms, data structures and other useful information for competitive programming. Used and maintained by members of the Ateneo de Manila University Programming Varsity.

Franz Louis Cesista