In your code snippet you are transforming the inputs to `LongTensors`

, so `pos_embeddings`

and `neg_embeddings`

will be equal:

```
anchors, pos_embeddings, neg_embeddings = torch.tensor([[8.0368e+16, 4.2619e+16]]), torch.tensor([[0.2196, 0.2067]]), torch.tensor([[0.1873, 0.9207]])
print(pos_embeddings.long())
> tensor([[0, 0]])
print(neg_embeddings.long())
> tensor([[0, 0]])
```

Given that, the output of the loss function should then be `margin`

, which is set to `1`

.

However, the intermediate distances would be large:

```
pos_embeddings = torch.tensor([[0, 0]])
neg_embeddings = pos_embeddings
dist_pos = torch.norm(anchor - pos_embeddings, p=2);
print(dist_pos)
> tensor(1.2923e+10, grad_fn=<NormBackward1>)
dist_neg = torch.norm(anchor - neg_embeddings, p=2);
print(dist_neg)
> tensor(1.2923e+10, grad_fn=<NormBackward1>)
```

The next operation would be the addition of the `margin`

to the distance and the `clamp`

operation:

```
output = torch.clamp_min(1. + dist_pos - dist_neg, 0)
print(output)
> tensor(0., grad_fn=<ClampMinBackward0>)
```

Here you can see, that `1. + dist_pos - dist_neg`

underflows. The reason is the decimal step size, which is `>1`

for values `>2**24`

as described in this Wikipedia article.

A potential fix would be to use:

```
output = torch.clamp_min(1. + (dist_pos - dist_neg), 0)
```

or `float64`

.

CC @tom what do you think about subtracting the distances first? Could this yield any other (unwanted) issues?