I’m trying to add a regularization term based on the weights of two networks with the same architecture. The first part of the code is something as the following:
logits1 = model1(data) logits2 = model2(data) loss_fn = nn.CrossEntropyLoss() loss1 = loss_fn(logits1, target) loss2 = loss_fn(logits2, target) loss = loss1 + loss2
After summing the two losses, I want to add a regularization term, based on the cosine similarity of the parameters in the two networks. The idea is to force the networks to have different weights; i.e., to be orthogonal. The formula for this similarity is the same as in torch.nn.CosineSimilarity().
Initially, I thought of storing the parameters in two lists, calculating the cosine similarity, and them adding this to the combined loss. Thus, something like:
cos = nn.CosineSimilarity(dim=0) regularization = beta * (cos(params1, params2) ** 2) loss += regularization
However, I have the impression that just adding the regularization to the loss wouldn’t actually affect the gradients, because no computational graph is created. Am I right? In this case, is there any suggestion on how to implement that?