Numerically unstable problem in matrix multiplication

In numpy, when i have a 3D tensor X with shape [A, B, C] and a 2D tensor Y with shape [C, D], then np.dot(X, Y) gives a 3D tensor with shape [A, B, D].

In PyTorch, i can do this as below. However, it seems that 2nd method is numerically unstable. How can i fix this?

X = Variable(torch.randn(2, 30, 400))
Y = Variable(torch.randn(400, 400))

# 1st method
outs = []
for i in range(X.size(0)):
    out = torch.mm(X[i], Y)
    outs.append(out)
result1 = torch.stack(outs)  # shape of (2, 3, 4)

# 2nd method
result2 = X.resize(2*30, 400).mm(Y)
result2 = result2.resize(2, 30, 400)

# 3rd method
result3 = torch.bmm(X, Y.unsqueeze(0).expand(X.size(0), *Y.size()))

assert np.allclose(result1.data.numpy(), result2.data.numpy())  # this causes an error
assert np.allclose(result1.data.numpy(), result3.data.numpy())
assert np.allclose(result2.data.numpy(), result3.data.numpy())  # this causes an error
assert np.allclose(result2.data.numpy(), result3.data.numpy(), 1e-2)  # this doesn't cause an error

I couldn’t reproduce the error with your script.
But note that the resize function might change the underlying data if the size doesn’t match with the size of the original tensor. In those situations, it’s better to simply use .view, which is guaranteed not to modify your tensor and to error out if the number of elements don’t match

1 Like

Hi all,

I am facing a similar issue: I have to implement a layer from a Theano implementation and reproducing it with the PyTorch framework leads to small numerical instability.

As an example I want to multiply two tensors: a and b.
A permutation is applied because the representation of those tensors are not the same between the two implementations.

Here is the code:

import torch
import numpy as np
import theano
import theano.tensor as T

torch.manual_seed(7)

def func_pytorch(a, b):

    a = a.permute(0,2,1).contiguous().view(-1,400)
    b = b.view(400,-1)

    return torch.matmul(a, b).view(100,5,400,10).permute(0,2,3,1)

def func_theano(a, b):
    return T.tensordot(a, b, [[2], [2]])

def func_numpy(a, b):
    return np.tensordot(a, b, [[2], [2]])

a = torch.randn(100,400,5)
b = torch.randn(400,400,10)

a_p = a.permute(0,2,1)
b_p = b.permute(2,1,0)

out_pytorch = func_pytorch(a, b)
out_numpy = np.transpose(func_numpy(a_p,b_p), (0,3,2,1))

out_true = np.transpose(func_theano(a_p, b_p).eval(), (0,3,2,1))

np.testing.assert_allclose(actual=out_numpy, desired=out_true, rtol=1e-7) # OK
np.testing.assert_allclose(actual=out_pytorch, desired=out_true, rtol=1e-7) # 76% mismatch

We see that between Theano and Numpy there is no problem when comparing both results. However, we can’t say the same about PyTorch. Does anyone know something about it ? Will we have a function similar to tensordot in PyTorch in the next release ?

Thanks in advance!