Cosine similarity regularization

Paulo_Pirozelli · March 2, 2023, 1:24pm

I’m trying to add a regularization term based on the weights of two networks with the same architecture. The first part of the code is something as the following:

logits1 = model1(data)

logits2 = model2(data)

loss_fn = nn.CrossEntropyLoss()

loss1 = loss_fn(logits1, target)

loss2 = loss_fn(logits2, target)

loss = loss1 + loss2

After summing the two losses, I want to add a regularization term, based on the cosine similarity of the parameters in the two networks. The idea is to force the networks to have different weights; i.e., to be orthogonal. The formula for this similarity is the same as in torch.nn.CosineSimilarity().

Initially, I thought of storing the parameters in two lists, calculating the cosine similarity, and them adding this to the combined loss. Thus, something like:

cos = nn.CosineSimilarity(dim=0)

regularization = beta * (cos(params1, params2) ** 2)

loss += regularization

However, I have the impression that just adding the regularization to the loss wouldn’t actually affect the gradients, because no computational graph is created. Am I right? In this case, is there any suggestion on how to implement that?

tom · March 2, 2023, 2:50pm

Why would the graph not be created? The parameters have the requires_grad attribute set, so this would be backpropagated into.
One thing to look for is what a good value of beta is. If you don’t have a theory about what should be the best ratio, you could try a range that includes a value to make the loss components approximate equal.

Best regards

Thomas

Paulo_Pirozelli · March 2, 2023, 7:42pm

Thank you for the answer! I was afraid this wouldn’t work.