The foundational base of this approach iterates on Keller Jordan's Muon optimizer, which originally replaced AdamW's element-wise parameter updates with steepest descent under the spectral norm.