PyTorch Conv2d vs Numpy reference; different outcomes. Rounding error or mistake?

Hi,

I’m currently experimenting with some custom layers (based on numpy). For this I tried to make a reference conv2d function, to know for sure that I correctly understand the inner workings. However, I ran into some issue that some output elements are not completely identical. Is there a way to fix this?

I attached some sample code that explains the problem I’m facing. I would really appreciate any feedback/help.

Kind regards,

Barry

import numpy as np
import torch
import torch.nn.functional as F
from torch.nn.parameter import Parameter
from torch.autograd import Function

def DConv2dFunction(input, param, bias, stride, padding, dilation=1, groups=1):
    """
      Desc: conv2d reference routine.
    """
    assert(dilation == 1) # no support for higher dilations
    assert(groups == 1) # no support for larger groups      
  
    # Convert input, weights and bias to numpy.
    d = input.data.cpu().numpy()
    d_pad = np.pad(d
              , pad_width=((0,),(0,),(padding,),(padding,))
              , mode='constant', constant_values=0)
    p = param.data.cpu().numpy()
    b = bias.data.cpu().numpy()     
    
    # Dimensions.
    N, Kc, H, W = d.shape
    C, Kc, Kh, Kw = p.shape
    W_out = int(np.floor((W + 2*padding - dilation*(Kw-1) - 1) / stride + 1))
    H_out = int(np.floor((H + 2*padding - dilation*(Kh-1) - 1) / stride + 1))
    
    # Compute output samples.
    res = np.zeros((N, C, H_out, W_out), dtype=np.float32)
    for n in range(N):
      for to in range(C):
        for y in range(H_out):
          for x in range(W_out):
            for ti in range(Kc):
              for ky in range(Kh):
                for kx in range(Kw):
                  res[n,to,y,x] += p[to,ti,ky,kx] * d_pad[n,ti,(y*stride)+ky,(x*stride)+kx]  
            res[n,to,y,x] += b[to]
    
    return Parameter(torch.from_numpy(res)).type_as(input)

# Layer dimensions
# N; batch size, C; output channels, H/W; feature map dimensions
# Kc; input channels, Kh; kernel height, Kw; kernel width
N, C, H, W = 1, 8, 28, 28
Kc, Kh, Kw = 16, 5, 5
padding=2
stride=1
dilation=1
groups=1

# Create random input, weights, and biases
torch.manual_seed(99)
if True: # set to false for integer values
  d = Parameter(torch.randn(N, Kc, H, W).type(torch.FloatTensor))
  p = Parameter(torch.randn(C, Kc, Kh, Kw).type(torch.FloatTensor))
  b = Parameter(torch.randn(C).type(torch.FloatTensor))
else:
  d = Parameter(torch.randn(N, Kc, H, W).mul(255).round().type(torch.FloatTensor))
  p = Parameter(torch.randn(C, Kc, Kh, Kw).mul(255).round().type(torch.FloatTensor))
  b = Parameter(torch.randn(C).mul(255).round().type(torch.FloatTensor))
  
# Compute result using PyTorch routine
res1 = F.conv2d(d, p, b, stride, padding, dilation, groups)
out1 = res1.cpu().data.numpy().flatten()

# Compute result using Reference routine
res2 = DConv2dFunction(d, p, b, stride, padding, dilation, groups)
out2 = res2.cpu().data.numpy().flatten()

# Check approximate equality per element
max_relative_difference = 1e-4
z = np.isclose(out1, out2, rtol=max_relative_difference, atol=0)
print("Number of elements that exceed margin: {}/{}".format(sum(~z), z.size))
if sum(~z) > 0:
  print("expected -> actual")
  for pytorch, reference in zip(out1[~z], out2[~z]):
      print("{:.7f} -> {:.7f}".format(pytorch, reference))

Which outputs:

  Number of elements that exceed margin: 10/6272
  expected -> actual
  -0.0515766 -> -0.0515714
  0.0141878 -> 0.0141795
  -0.0277182 -> -0.0277148
  -0.0202761 -> -0.0202698
  0.0075884 -> 0.0075939
  -0.0075113 -> -0.0075098
  -0.0163350 -> -0.0163383
  -0.0050793 -> -0.0050831
  0.0368888 -> 0.0368946
  0.0233984 -> 0.0234014

Which implies that 10 out of 6272 output samples are slightly different, as can be seen in the printout.

They’re not going to be identical due to different orderings of floating point addition. If you want to compare against a reference implementation, you should use double-precision numbers in the reference. You may also want to try comparing the reference implementation against itself with single and double precision numbers.

Hi, thank you for your reply.

Do you have any suggestion on how it should be ordered? I think only the kernel computation order matters. I tried to reorder the kernel computation (and the bias assignment up front instead of at the end), but with little success.

edit: well of course, the back-end of PyTorch probably uses GEMM (or something similar) for convolution, which in case of GEMM might even modify the number of multiplications or accumulations with optimized MatMul routines.

In the end I do matter about these few rounding errors because, I tried to substitute some PyTorch Conv2d layers with my own reference for some experiments, but the network outcome was affected (accuracy and loss).

I guess there is no easy solution for this problem :frowning:

The easy solution is to use double-precision floating point numbers (torch.DoubleTensor).

I wasn’t suggesting a different ordering: I’m suggesting that you compare double vs. float in a single implementation. You will see relative errors larger than 1e-4 in the small outputs due to cancellation.

The point is that this isn’t a mistake or bug. It’s within the expected error due to floating-point arithmetic.