Hi everyone. I’m trying to implement the elastic averaging stochastic gradient descent (EASGD) algorithm from the paper Deep Learning with Elastic Averaging SGD and was running into some trouble.
I’m using PyTorch’s torch.optim.Optimizer
class and referencing the official implementation of SGD and the official implementation of Accelerated SGD in order to start off somewhere.
The code that I have is:
import torch.optim as optim
class EASGD(optim.Optimizer):
def __init__(self, params, lr, tau, alpha=0.001):
self.alpha = alpha
if lr < 0.0:
raise ValueError(f"Invalid learning rate {lr}.")
defaults = dict(lr=lr, alpha=alpha, tau=tau)
super(EASGD, self).__init__(params, defaults)
def __setstate__(self, state):
super(EASGD, self).__setstate__(state)
def step(self, closure=None):
loss = None
if closure is not None:
with torch.enable_grad():
loss = closure()
for group in self.param_groups:
tau = group['tau']
for t, p in enumerate(group['params']):
x_normal = p.clone()
x_tilde = p.clone()
if p.grad is None:
continue
if t % tau == 0:
p = p - self.alpha * (x_normal - x_tilde)
x_tilde = x_tilde + self.alpha * (x_normal - x_tilde)
d_p = p.grad.data
p.data.add_(d_p, alpha=-group['lr'])
return loss
When I run this code, I get the following error:
/home/user/github/test-repo/easgd.py:50: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
Reading this PyTorch Discussion helped understand what the difference between leaf and non-leaf variables are, but I’m not sure how I should fix my code to make it work properly.
Any tips on what to do or where to look are appreciated. Thanks.