Cover

Muon and a Selective Survey on Steepest Descent in Riemannian and Non-Riemannian Manifolds

Muon from first principles, what makes it different from other optimizers, and why it works so well.

March 31, 2025 · Franz Louis Cesista

Napkin Math on Non-Euclidean Trust Region Optimization

A possible reason why Muon converges faster & does better at higher learning rates than Adam.

March 24, 2025 · Franz Louis Cesista

Blocked Matrix Formulation of Linear Attention Mechanisms

The blocked matrix formulation of linear attention mechanisms, multi-step online gradient descent at inference time, and chunk-wise parallelism.

March 16, 2025 · Franz Louis Cesista

Steepest Descent Under Schatten-p Norms

Why Muon still work despite not perfectly semi-orthogonalizing the gradients.

February 27, 2025 · Franz Louis Cesista
Cover

Squeezing 1-2% Efficiency Gains Out of Muon by Optimizing the Newton-Schulz Coefficients

Simply switching to Muon can already get you 2x efficiency gains. But you can squeeze out an extra 1-2% by optimizing the Newton-Schulz coefficients.

February 21, 2025 · Franz Louis Cesista

CASPR Without Accumulation is Muon

The CASPR optimizer, a variant of Shampoo, reduces to Muon when we remove the accumulation on the preconditioners.

February 13, 2025 · Franz Louis Cesista
Cover

GRPO's Main Flaw

GRPO may not be the best choice for training reasoning models. Here’s why.

February 11, 2025 · Franz Louis Cesista
Cover

(Linear) Attention as Test-Time Regression

A unifying framework for linear attention mechanisms as test-time regression and how to parallelize training and inference.

January 27, 2025 · Franz Louis Cesista
Cover

Deep Learning Optimizers as Steepest Descent in Normed Spaces

Instead of asking, ‘Which optimizer should I use?’ ask, ‘In which space do my features live in?’

October 20, 2024 · Franz Louis Cesista
Cover: ChatGPT May Have Developed Seasonal Depression

ChatGPT May Have Developed Seasonal Depression

Could ChatGPT’s shorter responses be an indication of something more bizarre going on?

December 16, 2023 · Franz Louis Cesista