Steepest Descent on Finsler-Structured (Matrix) Geometries via Dual Ascent
To guarantee fast and robust model training, we can recast the optimization problem as steepest descent on Finsler-structured geometries. Here we show how to compute the optimal updates via dual ascent.