SpatialConvolution / Conv2d: different results using Pytorch and torch7 for float tensors

Hi,
I tried to use convolution in Pytorch and torch7 (lua).
Operations with the same tensors (with float type) produce different results.

Python code:

import torch
import torch.nn as nn 

torch.set_default_tensor_type('torch.FloatTensor')

def test_conv():
    layer = nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1, bias=True)
    layer.weight.data.fill_(2.2)
    layer.bias.data.fill_(1.2)  

    tensor = torch.zeros((1, 64, 256, 256))
    tensor.fill_(1.3)
    # print(tensor)

    result = layer(tensor)
    print("[test_conv] result: shape={ %s }, type='%s'\n" % (result.shape, result.type()) )

    result_flatten = result.flatten()

    i = 0
    for n in result_flatten:
        if i >= 30: break
        number = n.item()
        print("[%d] %f" % (i+1, number) )
        i += 1

test_conv()

Result:

 [1] 733.357605
 [2] 1099.435791
 [3] 1099.435791
...
 [30] 1099.435791

Torch7 lua code:

require 'nn'

torch.setdefaulttensortype('torch.FloatTensor')

function test_conv()
    local kernel_size = 3
    local stride = 1
    local padding = 1
    local layer = nn.SpatialConvolutionMM(64, 64, kernel_size, kernel_size, stride, stride, padding, padding)
    layer.weight:fill(2.2) -- fill weigths with 2.2
    layer.bias:fill(1.2)   -- fill weigths with 1.2

    local tensor = torch.Tensor(1, 64, 256, 256)
    tensor:fill(1.3)       -- fill tesnor with 1.3
    -- print(tensor)

    local result = layer(tensor)

    print(string.format("result: shape={ %s }, type='%s'\n", result:size(), result:type()) )

    local result_flatten = result:view(result:nElement())
    for i = 1, 30 do
        print(string.format("[%d] %f", i, result_flatten[i]))
    end
end 

test_conv()

Result:

[1] 733.360107	
[2] 1099.439697	
[3] 1099.439697	
....
[30] 1099.437012	

Difference between results:

pytorch                      torch7
[1] 733.357605          [1] 733.360107	
[2] 1099.435791        [2] 1099.439697	
[3] 1099.435791        [3] 1099.439697	
....
[30] 1099.435791      [30] 1099.437012	  

Maybe it was caused by different float-point arithmetic in Pytorch and torch7?
or should use a different convolution operator?

As you suspect, differences like this are within numerical accuracy and thus would be expected between different implementations of the same operation (this is 3e-3 on a number of size 7e3, so ~2e-6ish relative error, which seems not unusual).

Best regards

Thomas

Thanks for your reply. I’m trying to use PyTorch pre-trained model in torch7 (lua) and C (libTNN). Seems these small differences may cause huge errors in the output of the entire network (which contains a lot of convolution layers).

Yeah, unfortunate as it is, something isn’t terribly robust about the model because errors of that size might happen even when switching backends within PyTorch.
You could try to train the PyTorch model for a few steps to give answers that more closely match the torch7 ones.