Parameters are not updated

I am implementing the review net (link: https://github.com/kimiyoung/review_net/blob/master/image_caption_offline/reason_att_copy.lua ). It is just an ordinary encoder-decoder framework plus review steps. The review step is similar to decoder step, but there is no input and the weights of LSTMs are not shared. In my implementation, it seems that the parameters of LSTMs in review steps are not updated when training. If the review steps are deleted, the code works fine and I can get the correct results.

Here is the code (https://gist.github.com/cswhjiang/cbc3d48cdd01efd5bcdf8ac92c0e66fa#file-review_net-py-L249). It seems that lines 249-259 are not correct. Could anyone give me some suggestions? Thanks.

Should add_module be used?

maybe this is a cause

thought_vectors[:, i, :] = output.clone()

try again with

thought_vectors[:, i, :] = output # without .clone()

to see if it work.

No. I don’t think it is the reason.

Try not to use in-place value assignment, which easily breaks the underlying graph. Thus, no gradients back-propagated.

Try:
thought = []
for i in range(something):
thought.append(output.clone().unsqueeze(1))
thought_vector = torch.stack(thought)

Thanks @toonz .

I am trying to use ModuleList. Is there any way to make sure that the underlying graph is correct?

ModuleList will work too. You can try to visualise the graph to check if it’s what you want to build: https://discuss.pytorch.org/t/print-autograd-graph/692.

1 Like