Hello,

I am working to get the gradient values for each input from the batch simultaneously.

More specifically, I need the mean of squared gradients of inputs from the batch.

I think it is possible to get it if the number of GPUs are the same as that of batch size, because each GPU calculates each input of the batch.

Here is the simple script.

```
import numpy as np
import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.nn.functional as F
input = Variable(torch.randn(4, 10).cuda())
target = Variable(torch.ones(4).long().cuda())
class dp(nn.Module):
def __init__(self):
super(dp, self).__init__()
self.n1 = nn.Linear(10,10)
self.n2 = nn.Linear(10,2)
def forward(self, x):
x = self.n1(x)
x = F.log_softmax(self.n2(x),dim=1)
return x
dp = dp().cuda()
dp.zero_grad()
dp = torch.nn.DataParallel(dp, device_ids=[0, 1, 2, 3])
dp.eval()
output = dp(input)
loss = F.nll_loss(output, target, reduction='none')
torch.autograd.backward([element for element in loss])
```

As you can see, 4 GPUs are employed and the batch size is 4.

So, each GPU evaluates the gradient of each input.

My question is how the gradient values of each input can be accessed.

Thanks in advance for your help.