In your code snippet you are transforming the inputs to LongTensors
, so pos_embeddings
and neg_embeddings
will be equal:
anchors, pos_embeddings, neg_embeddings = torch.tensor([[8.0368e+16, 4.2619e+16]]), torch.tensor([[0.2196, 0.2067]]), torch.tensor([[0.1873, 0.9207]])
print(pos_embeddings.long())
> tensor([[0, 0]])
print(neg_embeddings.long())
> tensor([[0, 0]])
Given that, the output of the loss function should then be margin
, which is set to 1
.
However, the intermediate distances would be large:
pos_embeddings = torch.tensor([[0, 0]])
neg_embeddings = pos_embeddings
dist_pos = torch.norm(anchor - pos_embeddings, p=2);
print(dist_pos)
> tensor(1.2923e+10, grad_fn=<NormBackward1>)
dist_neg = torch.norm(anchor - neg_embeddings, p=2);
print(dist_neg)
> tensor(1.2923e+10, grad_fn=<NormBackward1>)
The next operation would be the addition of the margin
to the distance and the clamp
operation:
output = torch.clamp_min(1. + dist_pos - dist_neg, 0)
print(output)
> tensor(0., grad_fn=<ClampMinBackward0>)
Here you can see, that 1. + dist_pos - dist_neg
underflows. The reason is the decimal step size, which is >1
for values >2**24
as described in this Wikipedia article.
A potential fix would be to use:
output = torch.clamp_min(1. + (dist_pos - dist_neg), 0)
or float64
.
CC @tom what do you think about subtracting the distances first? Could this yield any other (unwanted) issues?