NAdamW and Demon optimizers

Hey everyone,

While looking deeper into optimizers, I wanted to train my model using NAdamW ( a mix between NAdam and AdamW but I could not find any pytorch implementations, only a keras implementation. I’d appreciate any help to implement it.

As for DEMON (Decaying Momentum), I did find a pytorch implementation that looks promising, but I could not find any results or comparisons. Anyone tried using it before or can point me to another implementation?