I’m looking to create a weighted loss function where the weights always have norm 1 and are trainable. I see two main avenues for accomplishing this (described below). I have invested significant energy in only the first so far, because I am not comfortable enough with PyTorch to make much progress on the second (yet). I’m posting to ask for help with getting either of the avenues working (and initially, even confirmation that what I want to do is possible).
For the record, it seems at first glance that what I’m doing is similar to what’s done here and certainly the linked code in this repo has informed my attempts. Still, this code does not account for the normalization of the weights
parameter.
Avenue 1: hard-renormalizing the weights
I have tried to compute regular gradients and then simply normalize the weights to have norm 1. In this setting, I run into the following error:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
To do this hard-renormalization, I have tried something like this:
class WeightedLoss(nn.Module):
...
def forward(self, output, input, weights):
return (((output - input)**2 * weights)
.view(input.size(0), -1)
.sum(dim=1, keepdims=True)
.mean()
weights = Torch.rand(.9, 1.1)
weights.requires_grad = True
for epoch in range(n_epochs):
for batch in dataloader:
weights = F.normalize(weights, p=2, dim=0)
output = model(input)
loss = criterion(output, input, weights)
optimizer.zero_grad()
loss.backward()
optimizer.step()
This yields the RunTimeError mentioned above. I have also tried something like this:
class WeightedLoss(nn.Module):
def __init__(self, n_weights):
super().__init__()
self.n_weights = n_weights
self.weights = nn.Parameter(torch.Tensor(n_weights))
nn.init.uniform_(self.weights, .9, 1.1)
def forward(self, output, input):
self.weights = F.normalize(self.weights, p=2, dim=0)
return (((output - input)**2 * self.weights)
.view(input.size(0), -1)
.sum(dim=1, keepdims=True)
.mean()
for epoch in range(n_epochs):
for batch in dataloader:
output = model(input)
loss = criterion(output, input)
optimizer.zero_grad()
loss.backward()
optimizer.step()
Unfortunately, this second bit won’t work because F.normalize
does not return a parameter. I believe a variation on this code (so that self.weights
remains a parameter) either gives the same RunTimeError, or messes up somewhere else… I am not sure how to restructure either of these codes in a way that allows for retain_graph = True
to make sense without the code becoming exceptionally inefficient during training.
Avenue 2: projecting the gradient updates
Another perfectly acceptable option that I don’t know how to implement would be the following: instantiate the weights (either as a Tensor with requires_grad = True
, or as an nn.Parameter
in WeightedLoss
); write a custom backward
method for WeightedLoss
that computes the usual gradient for self.weights
and then projects it/normalizes it such that weights - alpha * weights.grad
still has 2-norm equal to 1. In this case, I’m entirely at a loss for how to write the function backwards
, and have had trouble finding online examples pertaining to my particular use case.
Thank you to anyone with useful insight on this problem.