Numerical floating difference between Conv with kernel size 1 and Linear

batch_size = 3
num_layers = 2
in_channels = 10
out_channels = 20
kernel_size = 1
stride = 1
padding = 0
dilation = 1
sharing_rates = [0, 0]
bias = True
input = torch.randn(batch_size, in_channels, 1, 1).to(device)
o = nn.Conv2d(in_channels,out_channels,kernel_size,stride,padding,dilation,1,bias).to(device)
output = o(input)
g = nn.Linear(in_channels,out_channels,bias).to(device)
g._parameters['weight'].data.copy_(o._parameters['weight'].squeeze().data)
g._parameters['bias'].data.copy_(o._parameters['bias'].squeeze().data)
goutput = g(input.squeeze())
print((output.squeeze()-goutput).abs().sum())
print(torch.eq(output.squeeze(),goutput).all())

I get following printout

tensor(1.9670e-06, grad_fn=<SumBackward0>)
tensor(0, dtype=torch.uint8)

I think Conv with kernel size 1 is the same as Linear operation. Am I wrong?

A difference of 1e-6 points to the limited floating point precision.
You could run your code again with DoubleTensors, which should yield a smaller difference.