nn.Conv2d output varies depending on batchsize in cuda

Hello everyone,

I noticed a strange behavior of the nn.Conv2d layer.
When I pass the same data through the layer, the output slightly varies depending on the input batchsize. This happens ONLY when I run calculations on GPU.

Here is a code snippet to illustrate my words.

import torch
import torch.nn as nn

x100 = torch.randn(100, 64, 56, 56)
x1 = x100[0].unsqueeze(0)
my_conv = nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1, bias=False)

# Let us firstly do calculations on a CPU
x1_out = my_conv(x1)
x100_out = my_conv(x100)
print(torch.max(torch.abs(x1_out[0] - x100_out[0])))

# As expected the last command gives the zero tensor as a result: tensor(0., grad_fn=<MaxBackward1>)

# Now, let us do the same on a GPU
x1 = x1.to('cuda')
x100 = x100.to('cuda')
my_conv.to('cuda')
x1_out = my_conv(x1)
x100_out = my_conv(x100)
print(torch.max(torch.abs(x1_out[0] - x100_out[0])))

# The last command gives a non-zero result: tensor(2.6226e-06, device='cuda:0', grad_fn=<MaxBackward1>)

I am probably missing something and there should be an easy explanation to this problem, but I cannot find one… Does anybody have an idea?

Thanks in advance.

By default cudnn can use nondeterministic algorithms. Set torch.backends.cudnn.deterministic = True if you want deterministic behavior at the cost of efficiency.

1 Like

Thank you very much! It indeed solved the problem