I am not quite sure about the intricacies of autograd, and I ran into this issue. Wondering if anyone would know what I messed up from a quick look.
I am running python 3.8.5 and torch 1.5.1.
I am embedding a loss inside another loss for regularization purposes. Here’s how my custom forward for my loss looks like:
def forward(self, model, sample, reduce = True): regterm = 0 for layer in model.encoder.layers: if isinstance(layer, Mycustomlayer): regterm += layer.compute_reg() loss, sample_size, logging_output = self.original_loss(model, sample, reduce = reduce) loss = loss + regterm return loss, sample_size, logging_output
where the original_loss eventually returns a construct similar to F.l1_loss, and I am adding a regularization term. Layer compute_reg() looks like this:
def compute_reg(self, *args): probs = self.cdf_qz() reg = torch.sum((1 - probs)*self.penalty) total_reg = torch.mean(reg) return total_reg def cdf_qz(self): low, high = stretch_limits assert low < 0 xn = (0 - low) / (high - low) assert xn != 1 logits = math.log(xn) - math.log(1 - xn) return torch.sigmoid(logits * lambda_multiplier - self.a).clamp(min=self.eps, max=1 - self.eps)
I get zero gradients on my backward pass on self.a - which should be a parameter of the network. Am I making any obvious blunders?