When calculating the dot product of two half-precision vectors, it appears that PyTorch uses float32 for accumulation, and finally converts the output back to float16. Is it possible to carry out all operations in float16?

```
import numpy as np
import torch
row = np.load("row.npy") # (1, 4096)
col = np.load("col.npy") # (4096, 1)
DIM = 4096
# Calculate the output using the dot product function
np.dot(row, col)
# array([[-0.01642]], dtype=float16)
# Accumulate the result in a float16 variable
result = np.float16(0)
for i in range(DIM):
result += row[0, i] * col[i, 0]
print(result)
# -0.01443
# Accumulate the result in a float32 variable
result = np.float32(0)
for i in range(DIM):
result += row[0, i] * col[i, 0]
print(result)
# -0.016465545
row = torch.from_numpy(row).cuda()
col = torch.from_numpy(col).cuda()
# Calculate the output using the dot product function
torch.dot(row[0], col[:, 0])
# tensor(-0.0164, device='cuda:0', dtype=torch.float16)
# Accumulate the result in a float16 variable
result = torch.tensor([0]).half().cuda()
for i in range(DIM):
result += row[0, i] * col[i, 0]
print(result)
# tensor([-0.0144], device='cuda:0', dtype=torch.float16)
```

So it looks like PyTorch’s dot product calculates results that are very close to that of NumPy’s, but they both use float32 for accumulation.

I can provide the sample row and column files but the website does not allow me to upload NumPy files.

PyTorch Version: 2.1.1

NumPy Version: 1.26.2