Hi everyone. Recently I got some issues when doing dot product using the following code.
hidden_states[batch][i] @ hidden_states[batch][0]
The answer is quite weird cause the element in the vector is no more than 1e-2 and the length is 768, but I got the answer of 100+.
I guess it may caused by the floating point overflow. So I try to print something and that’s what I get.
print(hidden_states[batch][i][589], hidden_states[batch][0][589])
# output:tensor(0.0248, device=‘cuda:0’, dtype=torch.float16, grad_fn=) tensor(0.0248, device=‘cuda:0’, dtype=torch.float16, grad_fn=)
print(hidden_states[batch][i][:588] @ hidden_states[batch][0][:588], hidden_states[batch][i][:589] @ hidden_states[batch][0][:589])
# output:tensor(3.1836, device=‘cuda:0’, dtype=torch.float16, grad_fn=) tensor(104.6250, device=‘cuda:0’, dtype=torch.float16, grad_fn=)
print(hidden_states[batch][i][:588] @ hidden_states[batch][0][:588] + hidden_states[batch][i][589] * hidden_states[batch][0][589])
# output:tensor(3.1836, device=‘cuda:0’, dtype=torch.float16, grad_fn=)
So it seems that when I sum the first 588 elements of the product result, the answer is all right. But when I try to sum the first 589 element, I got the overflow issue and incorrect answer.
Is there any way to solve such issue? Thank you very much for helping.