Hi all,

I am not quite sure about the intricacies of autograd, and I ran into this issue. Wondering if anyone would know what I messed up from a quick look.

I am running python 3.8.5 and torch 1.5.1.

I am embedding a loss inside another loss for regularization purposes. Here’s how my custom forward for my loss looks like:

```
def forward(self, model, sample, reduce = True):
regterm = 0
for layer in model.encoder.layers:
if isinstance(layer, Mycustomlayer):
regterm += layer.compute_reg()
loss, sample_size, logging_output = self.original_loss(model, sample, reduce = reduce)
loss = loss + regterm
return loss, sample_size, logging_output
```

where the original_loss eventually returns a construct similar to F.l1_loss, and I am adding a regularization term. Layer compute_reg() looks like this:

```
def compute_reg(self, *args):
probs = self.cdf_qz()
reg = torch.sum((1 - probs)*self.penalty)
total_reg = torch.mean(reg)
return total_reg
def cdf_qz(self):
low, high = stretch_limits
assert low < 0
xn = (0 - low) / (high - low)
assert xn != 1
logits = math.log(xn) - math.log(1 - xn)
return torch.sigmoid(logits * lambda_multiplier - self.a).clamp(min=self.eps, max=1 - self.eps)
```

I get zero gradients on my backward pass on self.a - which should be a parameter of the network. Am I making any obvious blunders?

Thanks!