Some issues when doing vector dot product

Hi everyone. Recently I got some issues when doing dot product using the following code.

hidden_states[batch][i] @ hidden_states[batch][0]

The answer is quite weird cause the element in the vector is no more than 1e-2 and the length is 768, but I got the answer of 100+.

I guess it may caused by the floating point overflow. So I try to print something and that’s what I get.

print(hidden_states[batch][i][589], hidden_states[batch][0][589])
# output:tensor(0.0248, device=‘cuda:0’, dtype=torch.float16, grad_fn=) tensor(0.0248, device=‘cuda:0’, dtype=torch.float16, grad_fn=)

print(hidden_states[batch][i][:588] @ hidden_states[batch][0][:588], hidden_states[batch][i][:589] @ hidden_states[batch][0][:589])
# output:tensor(3.1836, device=‘cuda:0’, dtype=torch.float16, grad_fn=) tensor(104.6250, device=‘cuda:0’, dtype=torch.float16, grad_fn=)

print(hidden_states[batch][i][:588] @ hidden_states[batch][0][:588] + hidden_states[batch][i][589] * hidden_states[batch][0][589])
# output:tensor(3.1836, device=‘cuda:0’, dtype=torch.float16, grad_fn=)

So it seems that when I sum the first 588 elements of the product result, the answer is all right. But when I try to sum the first 589 element, I got the overflow issue and incorrect answer.

Is there any way to solve such issue? Thank you very much for helping.

I’m unsure if overflows are seen as I would expect to see an invalid result in this case.
I assume you are not using torch.amp but are manually casting the tensors to float16?
If so, could you post a minimal, executable code snippet which would reproduce the issue?

Sorry bro, it seems that’s my fault to misestimate the value of the element in the vector. The value of hidden_states[batch][i][588] increases dramatically to about 10.x but waht I check is the value of hidden_states[batch][i][589] in my code above :joy:. Anyway, thank you very much for helping.