Wierd behavior of torch.matmul

koochak-mehdi · December 7, 2022, 11:05am

Hi,

I’m trying to implement Gaussian Process with pytorch, and faced some wierd behavior in using torch.matmul function which wonderd me whether i’m using it in correct way!

k1 = tensor([[1.0000, 1.0000, 0.9998],
             [1.0000, 1.0000, 1.0000],
             [0.9998, 1.0000, 1.0000]], dtype=torch.float64)

k2 = tensor([[1.0000, 1.0000, 0.9998, 0.9968],
             [1.0000, 1.0000, 1.0000, 0.9976],
             [0.9998, 1.0000, 1.0000, 0.9982]], dtype=torch.float64)

alphas = tensor([[-12851.3568],
                 [ 28321.3013],
                 [-15470.6538]], dtype=torch.float64)

now comes the wierd result, the first third elemets of y2 should be the same like the y1 but they are totally different!

y1 = torch.matmul(k1.T, alphas) = tensor([[0.8400],
                                          [0.9900],
                                          [0.2900]], dtype=torch.float64)

y2 = torch.matmul(k2.T, alphas) = tensor([[ 0.9685],
                                          [ 0.7068],
                                          [ 0.4447],
                                          [-1.1308]], dtype=torch.float64)

What did I do here wrong?!

thanks in advance for your tips

ConvolutionalAtom · December 7, 2022, 11:47am

They are giving the same values, please check your result again

koochak-mehdi · December 7, 2022, 1:26pm

Thanks for the replay, Its intressting! cause when I create this two Tenosrs the output is as expected correct, but when they(k1, k2) come from my kernel the result is wrong and equal to what I postet!

In order to regenerate the problem correctly, I’ll add like below my kernel.

import torch

class SQE:
    def __init__(self, l, sigma_f, noise):
        self.l = l
        self.sigma_f = sigma_f
        self.noise = noise

    def __call__(self, xi, xj):
        xi_ = xi.div(self.l)
        xj_ = xj.div(self.l)

        x1_eq_x2 = torch.equal(xi, xj)
        adj = xi_.mean(-2, keepdim=True)
        x1 = xi_ - adj
        x2 = xj_ - adj

        x1_norm = x1.pow(2).sum(dim=-1, keepdim=True)
        x1_pad = torch.ones_like(x1_norm)

        if x1_eq_x2:
            x2_norm, x2_pad = x1_norm, x1_pad
        else:
            x2_norm = x2.pow(2).sum(dim=-1, keepdim=True)
            x2_pad = torch.ones_like(x2_norm)
        
        x1_ = torch.cat([-2.0 * x1, x1_norm, x1_pad], dim=-1)
        x2_ = torch.cat([x2, x2_pad, x2_norm], dim=-1)
        res = x1_.matmul(x2_.transpose(-2, -1))

        if x1_eq_x2:
            res.diagonal(dim1=-2, dim2=-1).fill_(0)

        res.clamp_min_(0)
        res.div_(-2).exp_()

        res = res * self.sigma_f
        if x1_eq_x2:
            res.add_(torch.eye(res.shape[0], dtype=torch.float64) * self.noise)

        return res

then we’ve the following samples:


x_1 = torch.tensor([[0.12], [0.13], [0.14]], dtype=torch.float64)
y_1 = torch.tensor([[0.84], [0.99], [0.29]], dtype=torch.float64)

x_2 = torch.tensor([[0.12], [0.13], [0.14], [0.2]], dtype=torch.float64)
y_2 = torch.tensor([[0.84], [0.99], [0.29], [0.84]], dtype=torch.float64)

defining a kernel like this:

l = torch.tensor(1, dtype=torch.float64)
sf = torch.tensor(1, dtype=torch.float32)

kernel = SQE(l, sf, 1e-5)

than the calculating k1, k2 and alphas like below:

k1 = kernel(x_1, x_1)
k2 = kernel(x_1, x_2)

L = torch.linalg.cholesky(k1)
s1 = torch.linalg.solve_triangular(L, y_1, upper=False)
alphas = torch.linalg.solve_triangular(L.t(), s1, upper=True)

now the out put is like what I posted before!

ptrblck · December 8, 2022, 6:41am

Your k1 and k2 kernels have a numerical mismatch of 1e-5 in their diagonal:

kernel = SQE(l, sf, 1e-5)

k1 = kernel(x_1, x_1)
k2 = kernel(x_1, x_2)

print(k2[:, :3] - k1)
# tensor([[-1.0000e-05,  0.0000e+00,  0.0000e+00],
#         [ 0.0000e+00, -1.0000e-05,  0.0000e+00],
#         [ 0.0000e+00,  0.0000e+00, -1.0000e-05]], dtype=torch.float64)

which then increases the final numerical mismatch in the matmul:

y1 = torch.matmul(k1.T, alphas)
y2 = torch.matmul(k2.T, alphas)

print(y1 - y2[:3])
# tensor([[-0.1285],
#         [ 0.2832],
#         [-0.1547]], dtype=torch.float64)

Since alphas contains values in the range ~1e4.

You will get the same result if you manually add this mismatch:

k1 = torch.tensor([[1.0000, 1.0000, 0.9998],
                   [1.0000, 1.0000, 1.0000],
                   [0.9998, 1.0000, 1.0000]], dtype=torch.float64)

k2 = torch.tensor([[1.0000, 1.0000, 0.9998, 0.9968],
                   [1.0000, 1.0000, 1.0000, 0.9976],
                   [0.9998, 1.0000, 1.0000, 0.9982]], dtype=torch.float64)

print(k2[:, :3] - k1)
# tensor([[0., 0., 0.],
#         [0., 0., 0.],
#         [0., 0., 0.]], dtype=torch.float64)

alphas = torch.tensor([[-12851.3568],
                       [ 28321.3013],
                       [-15470.6538]], dtype=torch.float64)

y1 = torch.matmul(k1.T, alphas)
y2 = torch.matmul(k2.T, alphas)
print(y1 - y2[:3])
# tensor([[0.],
#         [0.],
#         [0.]], dtype=torch.float64)

# add numerical mismatches
k1[torch.arange(3), torch.arange(3)] += 1e-5
y1 = torch.matmul(k1.T, alphas)
y2 = torch.matmul(k2.T, alphas)

print(y1 - y2[:3])
# tensor([[-0.1285],
#         [ 0.2832],
#         [-0.1547]], dtype=torch.float64)

koochak-mehdi · December 8, 2022, 9:23am

Thanks, nice catch, that’s it!