Hello everyone,

I noticed a strange behavior of the nn.Conv2d layer.

When I pass the same data through the layer, the output slightly varies depending on the input batchsize. This happens ONLY when I run calculations on GPU.

Here is a code snippet to illustrate my words.

```
import torch
import torch.nn as nn
x100 = torch.randn(100, 64, 56, 56)
x1 = x100[0].unsqueeze(0)
my_conv = nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1, bias=False)
# Let us firstly do calculations on a CPU
x1_out = my_conv(x1)
x100_out = my_conv(x100)
print(torch.max(torch.abs(x1_out[0] - x100_out[0])))
# As expected the last command gives the zero tensor as a result: tensor(0., grad_fn=<MaxBackward1>)
# Now, let us do the same on a GPU
x1 = x1.to('cuda')
x100 = x100.to('cuda')
my_conv.to('cuda')
x1_out = my_conv(x1)
x100_out = my_conv(x100)
print(torch.max(torch.abs(x1_out[0] - x100_out[0])))
# The last command gives a non-zero result: tensor(2.6226e-06, device='cuda:0', grad_fn=<MaxBackward1>)
```

I am probably missing something and there should be an easy explanation to this problem, but I cannot find one… Does anybody have an idea?

Thanks in advance.