Since you would like to maximize the cosine similarity, I would go with the first approach, as in the worst case, you’ll add 0.01 * 2
to the loss and in the best (trained) case, it will be 1 - 1 = 0
. Here is a small dummy example of just rotating tensors:
x = torch.tensor([[1., 1.]], requires_grad=True)
y = torch.tensor([[-1, 0.]])
criterion = nn.CosineSimilarity()
optimizer = torch.optim.SGD([x], lr=1.)
for _ in range(20):
optimizer.zero_grad()
loss = 1. - criterion(x, y)
loss.backward()
optimizer.step()
print('x: {}, loss: {}'.format(x, loss.item()))