Loop over batches to compute loss, parameter not updating

ymeng · April 12, 2018, 6:41am

Hi, I am new to Pytorch and have a question.
In my model I defined a parameter (with nn.Parameter), let’s say M. The model has some layers at lower level, and then the output of the last layer is used to compute results with M. I have a loop over a batch.

results = [ ]
for layer_output in layer_output_batch:
    result = some_function(layer_output, M)
    results.append(result)
results = torch.cat(results)

Then I use the results to compute loss.

After performing backward, I found that the layers do get updated. However, M stays the same, and the gradient is always None.
What did I do wrong?
I understand usually people do not loop over batch, but my function is a little too complicated to write in a high dimensional tensor.

ymeng · April 12, 2018, 7:42am

OK I figured it out by myself.
When I defined the parameter, I used
self.M = nn.Parameter(torch.randn(size0, size1)).cuda()
but actually it should be:
self.M = nn.Parameter(torch.randn(size0, size1).cuda())

I think the program should report an error/warning in this case, instead of just letting the code run.