Hi, I am new to Pytorch and have a question.

In my model I defined a parameter (with nn.Parameter), let’s say M. The model has some layers at lower level, and then the output of the last layer is used to compute results with M. I have a loop over a batch.

```
results = [ ]
for layer_output in layer_output_batch:
result = some_function(layer_output, M)
results.append(result)
results = torch.cat(results)
```

Then I use the *results* to compute loss.

After performing backward, I found that the layers do get updated. However, M stays the same, and the gradient is always None.

What did I do wrong?

I understand usually people do not loop over batch, but my function is a little too complicated to write in a high dimensional tensor.