I am trying the perform a dot product between the columns of two tensors. I am trying to do this in the most efficient way possible. However, my two methods are not matching up.
My first method using torch.sum(torch.mul(a, b), axis=0)
gives me my expected results, torch.einsum('ji, ji -> i', a, b)
(take from Efficient method to compute the row-wise dot product of two square matrices of the same size in PyTorch - Stack Overflow) does not. The reproducible code is below:
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
torch.manual_seed(0)
a = torch.randn(3,1, dtype=torch.float).to(device)
b = torch.randn(3,4, dtype=torch.float).to(device)
print(f"a : \n{a}\n")
print(f"b : \n{b}\n")
print(f"Expected: {a[0,0]*b[0,0] + a[1,0]*b[1,0] + a[2,0]*b[2,0]}")
c = torch.sum(torch.mul(a, b), axis=0)
print(f"sum and mul: {c[0].item()}")
d = torch.einsum('ji, ji -> i', a, b)
print(f"einsum: {d[0].item()}\n")
print(torch.eq(c,d))
Notes:
On the CPU (all I did was remove the .to(device)
) the last line torch.eq(c,d)
is all true however, I need the tensors to be on the GPU.
Also for some seeds such as torch.manual_seed(100)
the tensor are equal…
I feel like it has to be something with einsum because I can get my expect answer other ways.