Sensitivity and Sharpness of Gated Linear Attention Mechanisms
We derive sensitivity and sharpness bounds for Gated DeltaNet and Mamba 2, showing that they can be made 1-Lipschitz with appropriate parameter constraints.
We derive sensitivity and sharpness bounds for Gated DeltaNet and Mamba 2, showing that they can be made 1-Lipschitz with appropriate parameter constraints.
The block matrix formulation of linear attention mechanisms, multi-step online gradient descent at inference time, and chunk-wise parallelism.
A unifying framework for linear attention mechanisms as test-time regression and how to parallelize training and inference.