1. Introduction
In a previous blog post, we discussed a novel method for clipping singular values of a matrix without the use of expensive singular value decompositions (SVDs). This is useful in deep learning for controlling weight norms, stabilizing training, and potentially enabling more aggressive low-precision training. Following the same technique, we can also clip the eigenvalues of a (symmetric) matrix efficiently. This can be used to efficiently project matrices onto the positive semi-definite cone, which is useful in e.g. finance and quantum mechanics where some equations require matrices to be positive semi-definite.
I have previously communicated this technique to the authors of “Factorization-free Orthogonal Projection onto the Positive Semidefinite Cone with Composite Polynomial Filtering” on ArXiv. I recommend reading their paper!
2. Eigenvalue Clipping
For now, we limit ourselves to symmetric matrices $W \in \mathcal{S}^{n}$ where $\mathcal{S}^{n} = \{W \in \mathbb{R}^{n \times n} | W = W^T\}$ is the set of all $n \times n$ real symmetric matrices. Symmetric matrices have real eigenvalues and can be diagonalized by an orthogonal matrix. We define Eigenvalue Clipping as follows:
Definition 1 (Eigenvalue Clipping). Let $W \in \mathcal{S}^{n}$ be a symmetric matrix and $W = Q \Lambda Q^T$ be its eigenvalue decomposition where $\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n)$ are the eigenvalues of $W$, $\lambda_i \in \mathbb{R}$ for all $i$, and $QQ^T = I$. Then we define Eigenvalue Clipping as the following matrix function $\texttt{eig\_clip}_{[\lambda_{min}, \lambda_{max}]}: \mathcal{S}^{n} \to \mathcal{S}^{n}$, $$\begin{equation}\texttt{eig\_clip}_{[\lambda_{min}, \lambda_{max}]}(W) = Q \texttt{clip}_{[\lambda_{min}, \lambda_{max}]}(\Lambda) Q^T\label{1}\end{equation}$$ where $\lambda_{min}, \lambda_{max} \in (-\infty, \infty)$ are hyperparameters that control the minimum and maximum attainable eigenvalues of the resulting matrix and $\texttt{clip}_{[\alpha, \beta]}: \mathbb{R} \to \mathbb{R}$ is applied element-wise on the eigenvalues of $W$,
$$\begin{equation}\texttt{clip}_{[\alpha, \beta]}(x) = \begin{cases} \alpha & \texttt{if } x < \alpha \\ x & \texttt{if } \alpha \leq x \leq \beta \\ \beta & \texttt{if } \beta < x \end{cases}\end{equation}$$ where $\alpha, \beta \in \mathbb{R} \cup \{-\infty, \infty\}$ and $\alpha \leq \beta$.
The naive implementation of this requires computing the eigenvalue decomposition of $W$, which is computationally expensive and requires high numerical precision (typically float32
). Instead, we make use of the GPU/TPU-friendly method to compute the matrix sign function $\texttt{msign}$ by Jordan et al. (2024) and the following identity from the previous blog post:
Proposition 2 (Computing $\texttt{clip}$ via $\texttt{sign}$). Let $\alpha, \beta \in \mathbb{R} \cup \{-\infty, \infty\}$ and $\texttt{clip}: \mathbb{R} \to \mathbb{R}$ be the clipping function defined in Definition 1. Then, $$\begin{equation}\texttt{clip}_{[\alpha, \beta]}(x) = \frac{\alpha + \beta + (\alpha - x)\texttt{sign}(\alpha - x) - (\beta - x)\texttt{sign}(\beta - x)}{2}\label{4}\end{equation}$$
2.1 Lifting to matrix form
We can lift Equation 3 to matrix form as follows:
$$\begin{align} \texttt{eig\_clip}_{[\alpha, \beta]}(W) &= Q \texttt{clip}_{[\alpha, \beta]}(\Lambda) Q^T\nonumber\\ &= Q \frac{(\alpha + \beta) I + (\alpha I - \Lambda)\texttt{sign}(\alpha I - \Lambda) - (\beta I - \Lambda)\texttt{sign}(\beta I - \Lambda)}{2} Q^T\nonumber\\ &= \frac{1}{2} [(\alpha + \beta) QQ^T\nonumber\\ &\qquad+ Q (\alpha I - \Lambda ) \texttt{sign}(\alpha I - \Lambda) Q^T\nonumber\\ &\qquad- Q (\beta I - \Lambda ) \texttt{sign}(\beta I - \Lambda) Q^T]\nonumber\\ &= \frac{1}{2} [(\alpha + \beta) I\nonumber\\ &\qquad+ Q (\alpha I - \Lambda ) (Q^T Q) \texttt{sign}(\alpha I - \Lambda) Q^T\nonumber\\ &\qquad- Q (\beta I - \Lambda ) (Q^T Q) \texttt{sign}(\beta I - \Lambda) Q^T]\nonumber\\ \texttt{eig\_clip}_{[\alpha, \beta]}(W) &= \frac{1}{2} [(\alpha + \beta) I + (\alpha I - W ) \texttt{msign}(\alpha I - W) - (\beta I - W ) \texttt{msign}(\beta I - W)]\nonumber\\ \end{align} $$
which we can implement in JAX as follows:
def eig_clip(W: jax.Array, alpha: float=-1., beta: float=1.) -> jax.Array:
if transpose := W.shape[0] > W.shape[1]:
W = W.T
I = jnp.eye(W.shape[0])
result = (1/2) * (
(alpha + beta) * I
+ (alpha * I - W) @ _orthogonalize_via_newton_schulz(alpha * I - W)
- (beta * I - W) @ _orthogonalize_via_newton_schulz(beta * I - W)
)
if transpose:
result = result.T
return result
2.2 Eigenvalue Hardcapping
Suppose we have symmetric matrices $W$ as weights in a neural network and we want to guarantee that the weights do not blow up during training. We can do this by capping the eigenvalues of $W$ to a maximum value $\beta$ after each gradient update. To do this, we can set $\alpha = -\infty$ in Equation 3:
$$\begin{align} \texttt{clip}_{[-\infty, \beta]}(x) &= \lim_{\alpha \to -\infty}\frac{\alpha + \beta + (\alpha - x)\texttt{sign}(\alpha - x) - (\beta - x)\texttt{sign}(\beta - x)}{2}\nonumber\\ &= \frac{\cancel{\alpha} + \beta - \cancel{\alpha} + x - (\beta - x)\texttt{sign}(\beta - x)}{2}\nonumber\\ \texttt{clip}_{[-\infty, \beta]}(x) &= \frac{\beta + x - (\beta - x)\texttt{sign}(\beta - x)}{2} \end{align}$$
Lifting this to matrix form yields,
$$\begin{align} \texttt{eig\_hardcap}_\beta(W) &= \texttt{eig\_clip}_{[-\infty, \beta]}(W) \nonumber\\ &= Q \texttt{clip}_{[-\infty, \beta]}(\Lambda) Q^T \nonumber\\ \texttt{eig\_hardcap}_\beta(W) &= \frac{1}{2} [\beta I + W - (\beta I - W) \texttt{msign}(\beta I - W)] \end{align}$$
which we can implement in JAX as follows:
def eig_hardcap(W: jax.Array, beta: float=1.) -> jax.Array:
I = jnp.eye(W.shape[0])
return (1/2) * (beta * I + W - (beta * I - W) @ _orthogonalize_via_newton_schulz(beta * I - W))
2.3 Eigenvalue ReLU (and projection to the PSD cone)
Suppose we want to bound the eigenvalues of $W$ from below by a minimum value $\alpha$. For $\alpha = 0$, this is equivalent to projecting $W$ onto the positive semi-definite cone which is useful in e.g. finance and quantum mechanics where objects are typically required to be positive semi-definite. We can do this by setting $\beta = +\infty$ in Equation 3:
$$\begin{align} \texttt{clip}_{[\alpha, \infty]}(x) &= \lim_{\beta \to \infty}\frac{\alpha + \beta + (\alpha - x)\texttt{sign}(\alpha - x) - (\beta - x)\texttt{sign}(\beta - x)}{2}\nonumber\\ &= \frac{\alpha + \cancel{\beta} + (\alpha - x)\texttt{sign}(\alpha - x) - (\cancel{\beta} - x)}{2}\nonumber\\ \texttt{clip}_{[\alpha, \infty]}(x) &= \frac{\alpha + x + (\alpha - x)\texttt{sign}(\alpha - x)}{2} \end{align}$$
Lifting this to matrix form yields,
$$\begin{align} \texttt{eig\_relu}_\alpha(W) &= \texttt{eig\_clip}_{[\alpha, \infty]}(W)\nonumber\\ &= Q \texttt{clip}_{[\alpha, \infty]}(\Lambda) Q^T\nonumber\\ \texttt{eig\_relu}_\alpha(W) &= \frac{1}{2} [\alpha I + W + (\alpha I - W) \texttt{msign}(\alpha I - W)] \end{align}$$
which we can implement in JAX as follows:
def eig_relu(W: jax.Array, alpha: float=0.) -> jax.Array
I = jnp.eye(W.shape[0])
return (1/2) * (alpha * I + W + (alpha * I - W) @ _orthogonalize_via_newton_schulz(alpha * I - W))
For the projection to the positive semi-definite cone, we set $\alpha = 0$:
$$\begin{aligned} \texttt{project\_psd}(W) &= \texttt{eig\_relu}_0(W) \\ &= \frac{1}{2} [0 + W + (0 - W) \texttt{msign}(0 - W)] \\ \texttt{project\_psd}(W) &= \frac{1}{2} [W + W \texttt{msign}(W)] \end{aligned}$$
which we can implement in JAX as follows:
def project_psd(W: jax.Array) -> jax.Array:
return (1/2) * (W + W @ _orthogonalize_via_newton_schulz(W))
3. [Under Construction] Steepest descent on the PSD cone
How to Cite
@misc{cesista2025eigclipping,
author = {Franz Louis Cesista},
title = {Fast, Numerically Stable, and Auto-Differentiable {E}igenvalue {C}lipping Via {N}ewton-{S}chulz Iteration"},
year = {2025},
month = {October},
day = {2},
url = {http://leloykun.github.io/ponder/eigenvalue-clipping/},
}
If you find this post useful, please consider supporting my work by sponsoring me on GitHub:
References
- Franz Cesista (2025). Fast, Numerically Stable, and Auto-Differentiable Spectral Clipping via Newton-Schulz Iteration. URL http://leloykun.github.io/ponder/spectral-clipping/
- Shucheng Kang, Haoyu Han, Antoine Groudiev, Heng Yang (2025). Factorization-free Orthogonal Projection onto the Positive Semidefinite Cone with Composite Polynomial Filtering. URL https://arxiv.org/abs/2507.09165
- Keller Jordan, Yuchen Jin, Vlado Boza, Jiacheng You, Franz Cesista, Laker Newhouse, and Jeremy Bernstein (2024). Muon: An optimizer for hidden layers in neural networks. Available at: https://kellerjordan.github.io/posts/muon/