I was looking at the new paper by Geoffrey Hinton and was wondering if it’s been implemented. Look at the article:
Cross posted:
I was looking at the new paper by Geoffrey Hinton and was wondering if it’s been implemented. Look at the article:
Cross posted:
from radam import RAdam
from optimizer import Lookahead
base_optim = RAdam(model.parameters(),lr = 0.001)
optimizer = Lookahead(base_optim, k=5, alpha=0.5)