Hi everyone. I’m trying to implement the elastic averaging stochastic gradient descent (EASGD) algorithm from the paper *Deep Learning with Elastic Averaging SGD* and was running into some trouble.

I’m using PyTorch’s `torch.optim.Optimizer`

class and referencing the official implementation of SGD and the official implementation of Accelerated SGD in order to start off somewhere.

The code that I have is:

```
import torch.optim as optim
class EASGD(optim.Optimizer):
def __init__(self, params, lr, tau, alpha=0.001):
self.alpha = alpha
if lr < 0.0:
raise ValueError(f"Invalid learning rate {lr}.")
defaults = dict(lr=lr, alpha=alpha, tau=tau)
super(EASGD, self).__init__(params, defaults)
def __setstate__(self, state):
super(EASGD, self).__setstate__(state)
def step(self, closure=None):
loss = None
if closure is not None:
with torch.enable_grad():
loss = closure()
for group in self.param_groups:
tau = group['tau']
for t, p in enumerate(group['params']):
x_normal = p.clone()
x_tilde = p.clone()
if p.grad is None:
continue
if t % tau == 0:
p = p - self.alpha * (x_normal - x_tilde)
x_tilde = x_tilde + self.alpha * (x_normal - x_tilde)
d_p = p.grad.data
p.data.add_(d_p, alpha=-group['lr'])
return loss
```

When I run this code, I get the following error:

```
/home/user/github/test-repo/easgd.py:50: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
```

Reading this PyTorch Discussion helped understand what the difference between leaf and non-leaf variables are, but I’m not sure how I should fix my code to make it work properly.

Any tips on what to do or where to look are appreciated. Thanks.