Hi, I am new to Pytorch and have a question.
In my model I defined a parameter (with nn.Parameter), let’s say M. The model has some layers at lower level, and then the output of the last layer is used to compute results with M. I have a loop over a batch.
results = [ ]
for layer_output in layer_output_batch:
result = some_function(layer_output, M)
results.append(result)
results = torch.cat(results)
Then I use the results to compute loss.
After performing backward, I found that the layers do get updated. However, M stays the same, and the gradient is always None.
What did I do wrong?
I understand usually people do not loop over batch, but my function is a little too complicated to write in a high dimensional tensor.