A C++ implementation of Meta’s Llama2 generative large-language model. I also optimized the original C implementation by Andrej Karpathy by parallelizing the multi-head attention layer, among other things.
Llama.cpp
July 25, 2023 · 1 min · Franz Louis Cesista · Github Repository
