When calculating the dot product of two half-precision vectors, it appears that PyTorch uses float32 for accumulation, and finally converts the output back to float16. Is it possible to carry out all operations in float16?
import numpy as np
import torch
row = np.load("row.npy") # (1, 4096)
col = np.load("col.npy") # (4096, 1)
DIM = 4096
# Calculate the output using the dot product function
np.dot(row, col)
# array([[-0.01642]], dtype=float16)
# Accumulate the result in a float16 variable
result = np.float16(0)
for i in range(DIM):
result += row[0, i] * col[i, 0]
print(result)
# -0.01443
# Accumulate the result in a float32 variable
result = np.float32(0)
for i in range(DIM):
result += row[0, i] * col[i, 0]
print(result)
# -0.016465545
row = torch.from_numpy(row).cuda()
col = torch.from_numpy(col).cuda()
# Calculate the output using the dot product function
torch.dot(row[0], col[:, 0])
# tensor(-0.0164, device='cuda:0', dtype=torch.float16)
# Accumulate the result in a float16 variable
result = torch.tensor([0]).half().cuda()
for i in range(DIM):
result += row[0, i] * col[i, 0]
print(result)
# tensor([-0.0144], device='cuda:0', dtype=torch.float16)
So it looks like PyTorch’s dot product calculates results that are very close to that of NumPy’s, but they both use float32 for accumulation.
I can provide the sample row and column files but the website does not allow me to upload NumPy files.
PyTorch Version: 2.1.1
NumPy Version: 1.26.2