cdemirsoy
(Canberk)
October 10, 2018, 2:40pm
1
There is a precision difference between the convolutions executed by CPU and GPU, using Conv2d().
In the worst case, results of forward pass on GPU and CPU are identical only up to 3 digits. If output channel is defined as 1, it requires summing over different channels which ends up in even lower precision; when input channels is greater than zero and output channels and groups are one, the results of forward pass on GPU and CPU are identical only up to 1 digit.
What is the reason for that?
1 Like
albanD
(Alban D)
October 10, 2018, 2:49pm
2
That sounds like too big a difference. Do you check that on a single conv layer?
Could you send a small code sample to reproduce this please?
cdemirsoy
(Canberk)
October 10, 2018, 3:44pm
3
Yes sure. I just check on a single conv layer (e.g., Conv2d) with random hyperparameters.
num_iter = 6000
torch.set_printoptions(precision=6)
for i in range(num_iter):
padVal = round(random.uniform(0, 10), 6)
padAmount = randint(1, 5) # starts from 1
weights = round(random.uniform(0, 10), 6)
dilated = randint(1, 4)
size_input = randint(15, 25)
size_kernel = randint(1, 5)
channel = randint(2, 5)
input_channel = 1
output_channel = 1
group_num = 1
stride_num = randint(1, 5)
print ('Pad value: %6f Pad amount: %s Weights: %6f Size input: %s Size kernel: %s Dilated: %s'
%(padVal, padAmount, weights, size_input, size_kernel, dilated))
# random inputs
non_padded_input = torch.randn(1, input_channel, size_input, size_input)
padder = nn.ConstantPad2d(padAmount, padVal)
padded_input = padder(non_padded_input)
# now declare nets
net_gpu = nn.Conv2d(input_channel, output_channel, size_kernel,
padding = 0, stride = stride_num, dilation = dilated,
groups = group_num, bias = 0).cuda()
net_cpu = nn.Conv2d(input_channel, output_channel, size_kernel,
padding = 0, stride = stride_num, dilation = dilated,
groups = group_num, bias = 0) # can be outside
# initialize weights with same random value
net_gpu.weight.data.fill_(weights).cuda();
net_cpu.weight.data.fill_(weights);
# forward pass
output_gpu = net_gpu(padded_input.cuda()).cuda()
output_cpu = net_cpu(padded_input)
# compare with threshold of second argument in inner function
if torch.all(torch.lt(torch.abs(torch.add(output_gpu, -output_cpu.cuda())), 1e-3)):
pass
else:
print('!!!!!!!!!!!!!!!!! BUG IN CODE !!!!!!!!!!!!!!!!!!!!')
#print ('Output gpu:'); print (output_gpu); print ('Output cpu:');print (output_cpu);
assert(False)
print ('Test is done, good to go!')
albanD
(Alban D)
October 25, 2018, 2:57pm
4
Hi,
I ran your code sample 10 times and it always returned no issue.
Do you have a special setting where it fails for you? I seems to work fine on my install.