Convergence Bounds for Steepest Descent Under Arbitrary Norms
First-order optimization under arbitrary norms with Nesterov momentum (and decoupled weight decay) yields a universal convergence bound. Our results generalize to norms not induced by inner products, and also considers batch size.