Sensitivity and Sharpness of Gated Linear Attention Mechanisms

We derive sensitivity and sharpness bounds for Gated DeltaNet and Mamba 2, showing that they can be made 1-Lipschitz with appropriate parameter constraints.

January 2, 2026 · 16 min · Franz Louis Cesista

Block Matrix Formulation of Linear Attention Mechanisms

The block matrix formulation of linear attention mechanisms, multi-step online gradient descent at inference time, and chunk-wise parallelism.

March 16, 2025 · 17 min · Franz Louis Cesista

(Linear) Attention as Test-Time Regression

A unifying framework for linear attention mechanisms as test-time regression and how to parallelize training and inference.

January 27, 2025 · 7 min · Franz Louis Cesista