1 comments

  • cs702 19 minutes ago
    A superior alternative to standard Muon and AdamW optimizers for training large models.

    Fantastic work, instantly valuable, immediately usable.

    A big THANK YOU to the authors:

    Jack Zhang, Noah Amsel, Berlin Chen, and Tri Dao