I’m looking to create a weighted loss function where the weights always have norm 1 and are trainable. I see two main avenues for accomplishing this (described below). I have invested significant energy in only the first so far, because I am not comfortable enough with PyTorch to make much progress on the second (yet). I’m posting to ask for help with getting either of the avenues working (and initially, even confirmation that what I want to do is possible).
For the record, it seems at first glance that what I’m doing is similar to what’s done here and certainly the linked code in this repo has informed my attempts. Still, this code does not account for the normalization of the
Avenue 1: hard-renormalizing the weights
I have tried to compute regular gradients and then simply normalize the weights to have norm 1. In this setting, I run into the following error:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
To do this hard-renormalization, I have tried something like this:
class WeightedLoss(nn.Module): ... def forward(self, output, input, weights): return (((output - input)**2 * weights) .view(input.size(0), -1) .sum(dim=1, keepdims=True) .mean() weights = Torch.rand(.9, 1.1) weights.requires_grad = True for epoch in range(n_epochs): for batch in dataloader: weights = F.normalize(weights, p=2, dim=0) output = model(input) loss = criterion(output, input, weights) optimizer.zero_grad() loss.backward() optimizer.step()
This yields the RunTimeError mentioned above. I have also tried something like this:
class WeightedLoss(nn.Module): def __init__(self, n_weights): super().__init__() self.n_weights = n_weights self.weights = nn.Parameter(torch.Tensor(n_weights)) nn.init.uniform_(self.weights, .9, 1.1) def forward(self, output, input): self.weights = F.normalize(self.weights, p=2, dim=0) return (((output - input)**2 * self.weights) .view(input.size(0), -1) .sum(dim=1, keepdims=True) .mean() for epoch in range(n_epochs): for batch in dataloader: output = model(input) loss = criterion(output, input) optimizer.zero_grad() loss.backward() optimizer.step()
Unfortunately, this second bit won’t work because
F.normalize does not return a parameter. I believe a variation on this code (so that
self.weights remains a parameter) either gives the same RunTimeError, or messes up somewhere else… I am not sure how to restructure either of these codes in a way that allows for
retain_graph = True to make sense without the code becoming exceptionally inefficient during training.
Avenue 2: projecting the gradient updates
Another perfectly acceptable option that I don’t know how to implement would be the following: instantiate the weights (either as a Tensor with
requires_grad = True, or as an
WeightedLoss); write a custom
backward method for
WeightedLoss that computes the usual gradient for
self.weights and then projects it/normalizes it such that
weights - alpha * weights.grad still has 2-norm equal to 1. In this case, I’m entirely at a loss for how to write the function
backwards, and have had trouble finding online examples pertaining to my particular use case.
Thank you to anyone with useful insight on this problem.