Manually adding biases to convolution gives wrong results

Hi, given a Conv2d module I’d like to manually perform the biases addition, i.e. given a module M with weights w and biases b, I can compute the output y given and input x as y = x * w + b.

What I’d like to do is to evaluate x * w using the module’s forward method M(x) and then manually add the biases b. Unfortunately I cannot just override the module’s biases since the one I want to add might be N-D tensors while PyTorch expects only 1-D tensors.

The problem I’m facing is that the results I obtain with the manual addition are different from the results obtained using the full module’s forward method when working with float32 tensor but the same results when working in float64 format. Is there any way this manual addition of the biases can give the same results as the forward method working with float32? Is the underlying convolution code operating differently from a python-side module(x) + bias?

Here how to reproduce the issue:

def test_conv_manual_bias_float32(self):
    module = nn.Conv2d(3, 64, 3, padding=1)
    x = torch.randn((64, 3, 128, 128))
    y_src = module(x)
    bias =
    y_prop = module(x) + bias[:, None, None]
    print(torch.allclose(y_src, y_prop))

def test_conv_manual_bias_float64(self):
    module = nn.Conv2d(3, 64, 3, padding=1).double()
    x = torch.randn((64, 3, 128, 128)).double()
    y_src = module(x)
    bias =
    y_prop = module(x) + bias[:, None, None]
    print(torch.equal(y_src, y_prop))

The difference can also be tested ussing torch.max(y_src.abs() - y_prop.abs()), that returns a value grater than 0 for the first function and 0 for the second.

1 Like

The max. absolute error for the FP32 implementation is tensor(8.3447e-07, grad_fn=<MaxBackward1>) on my system, which would be explained by the limited floating point precision.
I don’t know which path is taken for FP64 on the CPU, but assume that less or no optimizations are applied internally, which could explain the 0 difference.

1 Like

Is it possible that the C implementation of the convolution bias differs from a simple addition in python? Because I’m simply adding the same exact tensor. This problem doens’t seem to exist when using Linear layers.

Yes, the internal conv implementation (using MKL?) could use another approach than the native addition used in PyTorch, which could thus yield these small numerical mismatches.