Hey there! I am trying out an implementation of the Improved WGAN. I am unfortunately picking up this error : `RuntimeError: Scatter is not differentiable twice`

when trying to get perform `gradient_penalty.backward()`

. Could someone help? The relevant code is below:

```
gradient_penalty = get_grad_pen(self.Dis_net, X, X_f.cpu().data, lmbda)
gradient_penalty.backward()
```

`get_grad_pen`

is defined as follows:

```
def get_grad_pen(Dis_net, real_data, fake_data, lmbda):
epsilon = t.FloatTensor(real_data.size(1), real_data.size(2), real_data.size(3)).uniform_(0, 1)
epsilon.expand_as(real_data)
interpolated_data = real_data*(epsilon) + fake_data*(1 - epsilon)
interpolated_dataV = V(interpolated_data.cuda(), requires_grad=True)
gradients = t.autograd.grad(outputs=Dis_net(interpolated_dataV).mean(0).view(1), inputs=interpolated_dataV, create_graph=True, retain_graph=True, only_inputs=True)[0]
grad_pen = ((gradients.norm(2, dim=1) - 1).pow(2)).mean().mul(lmbda)
return grad_pen
```

`V`

is `torch.autograd.Variable`

, is case you were wondering.

Also, note that this happens only on a GPU. If the model is transferred to a CPU i.e., `interpolated_dataV = V(interpolated_data, requires_grad=True)`

, then this is not observed.

Apparently, I think this only happens if we use multi-GPU support. When I tried without `nn.parallel.data_parallel`

, it seems to be working.