I’m trying to add a regularization term based on the weights of two networks with the same architecture. The first part of the code is something as the following:
logits1 = model1(data)
logits2 = model2(data)
loss_fn = nn.CrossEntropyLoss()
loss1 = loss_fn(logits1, target)
loss2 = loss_fn(logits2, target)
loss = loss1 + loss2
After summing the two losses, I want to add a regularization term, based on the cosine similarity of the parameters in the two networks. The idea is to force the networks to have different weights; i.e., to be orthogonal. The formula for this similarity is the same as in torch.nn.CosineSimilarity().
Initially, I thought of storing the parameters in two lists, calculating the cosine similarity, and them adding this to the combined loss. Thus, something like:
cos = nn.CosineSimilarity(dim=0)
regularization = beta * (cos(params1, params2) ** 2)
loss += regularization
However, I have the impression that just adding the regularization to the loss wouldn’t actually affect the gradients, because no computational graph is created. Am I right? In this case, is there any suggestion on how to implement that?