Getting incorrect subtraction result

Hiba_Ahsan · November 10, 2019, 5:38am

I have a matrix X of shape (N,D) ans a vector mu of shape(D,K). I tried subtracting one column of mu from one row of X and am getting the wrong answer. When I convert it to numpy array, I am getting the right answer. What am I doing wrong?

print(X[0,:])
print(mu[:,0])
print(X[0,:]-mu[:, 0])
print(X[0,:].data.numpy()-mu[:, 0].data.numpy())

tensor([-2.4682, 0.9872, -0.4661, 1.7443, 0.5375, -0.5948, 2.2457, -0.2980,
1.2467, 0.2932], dtype=torch.float64)
tensor([-1.5256, -0.6092, -0.7773, 0.4676, 0.8657, -0.1759, -0.7981, -0.4370,
0.2152, -0.8696], grad_fn=)
tensor([ -9.4257e-01, 1.1631e+00, -4.6614e-01, 1.7443e+00, 5.3751e-01,
-5.9480e-01, 2.2457e+00, -1.1146e+171, 2.1892e+00, 8.8797e-01],
dtype=torch.float64, grad_fn=)
[-0.94256972 1.59635256 0.31117838 1.27669038 -0.32818301 -0.41886666
3.0437956 0.13902091 1.03149757 1.16275023]

ptrblck · November 10, 2019, 6:51am

Which PyTorch version are you using?
Is this error reproducible? If so, could you please upload the tensors somewhere so that we could have a look?

Hiba_Ahsan · November 11, 2019, 3:56am

This was happening when I read data from a numpy array to a Variable X:
X = Variable(torch.from_numpy(X))
mu = Variable(torch.randn(D, K), requires_grad=True)

But when I changed it to X = torch.FloatTensor(X), subtraction was fine. What is the reason for this?

ptrblck · November 11, 2019, 7:42am

Variables are deprecated since version 0.4.0, so you shouldn’t use them anymore.
Just out of curiosity, do you have a code snippet to reproduce this issue?

Hiba_Ahsan · November 11, 2019, 2:07pm

X is a numpy array of shape (N,D)
X = Variable(torch.from_numpy(X))
mu = Variable(torch.randn(D, K), requires_grad=True)

print(X[0,:]-mu[:,0])

beaupreda · November 11, 2019, 2:48pm

I think the problem comes from the different data types of X (float64) and mu (float32). When you do X = torch.FloatTensor(X), you change the type of X to float32, hence the correct result. If you specify the type of mu to float64 and do not change X, everything should also work. As @ptrblck said, Variable is deprecated so here is an example of what I have with PyTorch 1.0

    N = 10
    D = 5
    K = 2
    X = np.random.rand(N, D)
    X = torch.from_numpy(X)
    mu = torch.randn(D, K, dtype=torch.float64)
    res = X[0, :] - mu[:, 0]
    print(X[0, :])
    print(mu[:, 0])
    print(res)

ptrblck · November 11, 2019, 5:26pm

That’s a good point, but the results of the PyTorch and numpy subtraction have a huge difference:

a = torch.tensor([ -9.4257e-01, 1.1631e+00, -4.6614e-01, 1.7443e+00, 5.3751e-01,
                  -5.9480e-01, 2.2457e+00, -1.1146e+171, 2.1892e+00, 8.8797e-01],
                  dtype=torch.float64)
b = torch.tensor([-0.94256972, 1.59635256, 0.31117838, 1.27669038, -0.32818301,
                  -0.41886666, 3.0437956, 0.13902091, 1.03149757, 1.16275023],
                  dtype=torch.float64)

print(a - b)
> tensor([ -2.8000e-07,  -4.3325e-01,  -7.7732e-01,   4.6761e-01,   8.6569e-01,
         -1.7593e-01,  -7.9810e-01, -1.1146e+171,   1.1577e+00,  -2.7478e-01],
       dtype=torch.float64)

Also, the PyTorch output looks a bit strange, as it contains -1.1146e+171.

@Hiba_Ahsan Could you upload the input tensors somewhere, so that we can have a look?

Hiba_Ahsan · November 11, 2019, 5:51pm

This is the kind of weird output I was getting as well. Huge powers. I don’t think I can upload data for X since it is someone else’s private dataset X is just a float64 type numpy array. I generated mu using torch.randn. You are able to reproduce the issue without my data anyway!

ptrblck · November 11, 2019, 5:53pm

Unfortunately I cannot reproduce this issue.
Neither with tensors (in FP32 and FP64) not with the deprecated Variables type.

Hiba_Ahsan · November 11, 2019, 6:18pm

X = Variable(torch.from_numpy(xtr))
mu = Variable(torch.randn(10, 5), requires_grad=True)
print(X[0,:]-mu[:,0])
tensor([ -2.6652e+00, 2.0585e+00, -4.6961e-01, 1.7443e+00, -4.7861e+180,
-6.3082e+199, -1.6779e+243, -9.3102e+242, 1.2467e+00, 2.9316e-01],
dtype=torch.float64, grad_fn=)
print(X[0,:])
tensor([-2.4682, 0.9872, -0.4661, 1.7443, 0.5375, -0.5948, 2.2457, -0.2980,
1.2467, 0.2932], dtype=torch.float64)
print(mu[:,0])
tensor([ 0.1971, 1.0803, -0.8145, 0.9733, -1.4811, -1.0713, 0.3086, 0.6137,
-0.6779, 0.3576], grad_fn=)
print(X[0,:].data.numpy()-mu[:,0].data.numpy())
[-2.66522103 -0.09317447 0.34836983 0.77099426 2.01856725 0.47648397
1.93714274 -0.91173426 1.92456246 -0.06439294]

ptrblck · November 11, 2019, 6:31pm

xtr = np.random.randn(5, 10)
xtr[0] = [-2.4682, 0.9872, -0.4661, 1.7443, 0.5375, -0.5948, 2.2457, -0.2980, 1.2467, 0.2932]
X = Variable(torch.from_numpy(xtr))
mu = Variable(torch.randn(10, 5), requires_grad=True)
mu[:, 0] = torch.tensor([ 0.1971, 1.0803, -0.8145, 0.9733, -1.4811, -1.0713, 0.3086, 0.6137, -0.6779, 0.3576])

res1 = X[0,:]-mu[:,0]
res2 = X[0,:].data.numpy()-mu[:,0].data.numpy()
print((res1.detach().numpy()-res2))
> [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Hiba_Ahsan · November 11, 2019, 7:05pm

I don’t understand My version is 1.3. Also what is the snippet with tensors a and b that you had posted earlier that had a huge difference?

ptrblck · November 11, 2019, 9:07pm

For a and b I just initialized using your result to visualize the error in the results.
So it wasn’t a reproduction of the bug, but rather just the information how large the difference in your code is.

Hiba_Ahsan · November 11, 2019, 9:40pm

xtr = np.random.randn(5, 10)
xtr[0] = [-2.4682, 0.9872, -0.4661, 1.7443, 0.5375, -0.5948, 2.2457, -0.2980, 1.2467, 0.2932]
X = Variable(torch.from_numpy(xtr))
mu = Variable(torch.randn(10, 5), requires_grad=True)
mu[:, 0] = torch.tensor([ 0.1971, 1.0803, -0.8145, 0.9733, -1.4811, -1.0713, 0.3086, 0.6137, -0.6779, 0.3576])
>>> X[0,:]-mu[:,0]
tensor([ -2.6653e+00,   2.0585e+00,  -4.6610e-01, -1.2407e+214,   5.3750e-01,
         -5.9480e-01,   2.2457e+00, -1.7297e+156,  -1.7331e+97,   2.9320e-01],
       dtype=torch.float64, grad_fn=<SubBackward0>)

>>> X[0,:].data.numpy()-mu[:,0].data.numpy()
array([-2.6653    , -0.09309997,  0.34839997,  0.77100002,  2.01859996,
        0.47650003,  1.93709999, -0.91169997,  1.92460002, -0.0644    ])

I followed the same steps and am still facing this issue.

ptrblck · November 12, 2019, 2:04am

Great, it seems you are able to reproduce this issue!
Back to my initial question: which PyTorch version are you currently using?

Hiba_Ahsan · November 12, 2019, 2:47am

My PyTorch version is 1.3.0.

ptrblck · November 12, 2019, 4:14am

Thanks for the version number!
I could reproduce a high error (although not in the same value range).

@albanD Could this be related to 28010?

@Hiba_Ahsan Could you update please to PyTorch 1.3.1, as I cannot reproduce it in the latest release?

albanD · November 12, 2019, 4:40am

I think this is exactly the problem in 28010 and it was fixed in PR #28231. This should be fixed in 1.3.1. Can you test and let us know @Hiba_Ahsan?

Hiba_Ahsan · November 13, 2019, 1:51pm

I am trying to update but it says “Requirement already up-to-date: torch in \anaconda3\lib\site-packages (1.3.0)”.

albanD · November 13, 2019, 4:33pm

Hi,

The packages are properly updated online: https://anaconda.org/pytorch/pytorch
You might want to upgrade conda to make sure you can get the latest libraries?