The foundational base of this approach iterates on Keller Jordan's Muon optimizer, which originally replaced AdamW's element-wise parameter updates with steepest descent under the spectral norm.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results