Creating a cosine-similarity loss function for a supervised learning task

I would like to make a loss function based on cosine similarity to cluster my data (which is labled) in 2d space. I want it to pass through a NN which ends with two output neurons (x and y coordinates). So lets say x_i , t_i , y_i are input, target and output of the neural network. Then the target is one-hot encoded (classification) but the output are the coordinates (regression).
So to calculate the loss Id like to check the cosine similarity of the output with all other outputs of the dataset which have the same target. So I want to minimize the cosine angle between all datapoints which have the same target . So the loss could not be computed every training-step but first when the entire dataset has passed through the network once. So if I would built a custom loss function in Pytorch it would need loops and if-else checks to collect or find all the same labels. Is that even differentiable then? Or can I just implement it in Pytorch and the dynamic computation graph will figure it out itself? Answers appreciated!

So you (likely, except for tiny datasets) cannot just backpropagate through everything because you’d exhaust memory before that. I’d probably look at adapting the self-supervised learning methods (like MoCo) - in a nutshell they

  • maximize batch size as far as they can,
  • use a longer history of data (without computing gradients for this bit) when computing the gradients for the other parts.

Best regards


Thank you for your answer, it led me to contrastive learning and that did it for me! Greetings from munich :slight_smile: