# Column Wise Dot-Product torch.einsum not matching torch.sum(torch.mul(), axis=0)

I am trying the perform a dot product between the columns of two tensors. I am trying to do this in the most efficient way possible. However, my two methods are not matching up.

My first method using `torch.sum(torch.mul(a, b), axis=0)` gives me my expected results, `torch.einsum('ji, ji -> i', a, b)` (take from Efficient method to compute the row-wise dot product of two square matrices of the same size in PyTorch - Stack Overflow) does not. The reproducible code is below:

``````import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
torch.manual_seed(0)

a = torch.randn(3,1, dtype=torch.float).to(device)
b = torch.randn(3,4, dtype=torch.float).to(device)

print(f"a : \n{a}\n")
print(f"b : \n{b}\n")
print(f"Expected:    {a[0,0]*b[0,0] + a[1,0]*b[1,0] + a[2,0]*b[2,0]}")

c = torch.sum(torch.mul(a, b), axis=0)
print(f"sum and mul: {c.item()}")

d = torch.einsum('ji, ji -> i', a, b)
print(f"einsum:      {d.item()}\n")

print(torch.eq(c,d))
``````

Notes:
On the CPU (all I did was remove the `.to(device)`) the last line `torch.eq(c,d)` is all true however, I need the tensors to be on the GPU.

Also for some seeds such as `torch.manual_seed(100)` the tensor are equal…

I feel like it has to be something with einsum because I can get my expect answer other ways.

I think these small discrepancies will be expected given that float32 only has 7-8 digits of precision anyway. You should check that the difference is much smaller with float64.

Yes, using `dtype=torch.float64` made the difference smaller but do you happen to know why there is a discrepancy at all between using `torch.sum(torch.mul(a, b), axis=0)` and `torch.einsum('ji, ji -> i', a, b)`?

torch.einsum computes the same quantity using a different sequence of operations, in this case - viewing reshaping + batched matrix multiply.