Hi, My apologies if my question is very naive. I am new to Pytorch.

I am trying to add cosine similarity score in cross entropy loss such a way that similarity score should be maximise. I am confused between fallowing two codes i.e.,

#code
loss = criterion(pred, gtruth) + 0.01 * (1 - cosineSim(pred_repj, gtruth_repj))
or
loss = criterion(pred, gtruth) + 0.01 * ( cosineSim(pred_repj, gtruth_repj)) #code

Do I need to take care any other aspect if I am adding similarity score in this way?

Since you would like to maximize the cosine similarity, I would go with the first approach, as in the worst case, you’ll add 0.01 * 2 to the loss and in the best (trained) case, it will be 1 - 1 = 0. Here is a small dummy example of just rotating tensors:

x = torch.tensor([[1., 1.]], requires_grad=True)
y = torch.tensor([[-1, 0.]])
criterion = nn.CosineSimilarity()
optimizer = torch.optim.SGD([x], lr=1.)
for _ in range(20):
optimizer.zero_grad()
loss = 1. - criterion(x, y)
loss.backward()
optimizer.step()
print('x: {}, loss: {}'.format(x, loss.item()))