I’ve encountered a problem of gradients of conv2d layer having random behavior on GPU. It only happens on GPU with certain hyperparameters.

Specifically, for the same input and same network, the gradients are not exactly the same every time.

```
import torch
import numpy as np
from torch.autograd import Variable
import torch.nn as nn
NumChannels = 32
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(
NumChannels, NumChannels, kernel_size=3, stride=1)
self.conv2 = nn.Conv2d(
NumChannels, NumChannels, kernel_size=3, stride=2)
def forward(self, x):
out = x
out = self.conv1(out)
out = self.conv2(out)
return out
if __name__ == '__main__':
batch_size = 11
np.random.seed(6)
torch.manual_seed(6666)
inputs = np.random.uniform(0, 1, size=(batch_size, NumChannels, 32, 32))
inputs = torch.from_numpy(inputs.astype(np.float32))
model = Net()
model.eval()
prev_gradsum = 0
prev_outputsum = 0
model.cuda()
inputs = inputs.cuda()
for ii in range(100):
xvar = Variable(inputs, requires_grad=True)
output = model(xvar)
loss = (output ** 4).sum()
loss.backward()
if prev_gradsum != 0:
assert xvar.grad.data.sum() == prev_gradsum, \
(xvar.grad.data.sum(), prev_gradsum, ii)
assert output.data.sum() == prev_outputsum, prev_outputsum
prev_gradsum = xvar.grad.data.sum()
prev_outputsum = output.data.sum()
```

The assert will return

```
Traceback (most recent call last):
File "test_wrn.py", line 50, in <module>
(xvar.grad.data.sum(), prev_gradsum, ii)
AssertionError: (963.59619140625, 963.5963134765625, 48)
```

This does NOT happen on CPU, and does NOT happen on P100 cards on a IBM Minsky Power8 machine. But happens on 1080TI cards on different Intel machines.

We 0.4.0a0+b21e135 for both 1080ti and p100. and

0.3.1.post2 for 1080ti

Is this something to be expected?