Curiously I don't get any gradients in one layer

Hi all,
I am not quite sure about the intricacies of autograd, and I ran into this issue. Wondering if anyone would know what I messed up from a quick look.
I am running python 3.8.5 and torch 1.5.1.

I am embedding a loss inside another loss for regularization purposes. Here’s how my custom forward for my loss looks like:

    def forward(self, model, sample, reduce = True):
        regterm = 0
        for layer in model.encoder.layers:
            if isinstance(layer, Mycustomlayer):
                regterm += layer.compute_reg()
        loss, sample_size, logging_output = self.original_loss(model, sample, reduce = reduce)
        loss = loss + regterm
        return loss, sample_size, logging_output

where the original_loss eventually returns a construct similar to F.l1_loss, and I am adding a regularization term. Layer compute_reg() looks like this:

    def compute_reg(self, *args):
        probs = self.cdf_qz()
        reg = torch.sum((1 - probs)*self.penalty)
        total_reg = torch.mean(reg)
        return total_reg

    def cdf_qz(self):
        low, high = stretch_limits
        assert low < 0
        xn = (0 - low) / (high - low)
        assert xn != 1
        logits = math.log(xn) - math.log(1 - xn)
        return torch.sigmoid(logits * lambda_multiplier - self.a).clamp(min=self.eps, max=1 - self.eps)

I get zero gradients on my backward pass on self.a - which should be a parameter of the network. Am I making any obvious blunders?
Thanks!

Hi,

The one thing I would check is the clamp op that will generate 0 gradients for all the clamped values. So you want to make sure that the initial value of a is properly within the linear region of the clamp.

Otherwise, you also want to make sure that both a.is_leaf() and a.requires_grad are True.

Thanks loads for the tip! I will check the clamp, the a.is_lead() and a.requires_grad are both in order, I have double checked before.
Cheers!