Parameters are not updated

cswhjiang · June 2, 2017, 9:43am

I am implementing the review net (link: https://github.com/kimiyoung/review_net/blob/master/image_caption_offline/reason_att_copy.lua ). It is just an ordinary encoder-decoder framework plus review steps. The review step is similar to decoder step, but there is no input and the weights of LSTMs are not shared. In my implementation, it seems that the parameters of LSTMs in review steps are not updated when training. If the review steps are deleted, the code works fine and I can get the correct results.

Here is the code (https://gist.github.com/cswhjiang/cbc3d48cdd01efd5bcdf8ac92c0e66fa#file-review_net-py-L249). It seems that lines 249-259 are not correct. Could anyone give me some suggestions? Thanks.

cswhjiang · June 2, 2017, 11:29am

Should add_module be used?

ThaiThien · June 2, 2017, 3:23pm

maybe this is a cause

thought_vectors[:, i, :] = output.clone()

try again with

thought_vectors[:, i, :] = output # without .clone()

to see if it work.

cswhjiang · June 2, 2017, 4:08pm

No. I don’t think it is the reason.

toonz · June 2, 2017, 4:19pm

Try not to use in-place value assignment, which easily breaks the underlying graph. Thus, no gradients back-propagated.

Try:
thought = []
for i in range(something):
thought.append(output.clone().unsqueeze(1))
thought_vector = torch.stack(thought)

cswhjiang · June 3, 2017, 3:19am

Thanks @toonz .

I am trying to use ModuleList. Is there any way to make sure that the underlying graph is correct?

toonz · June 5, 2017, 9:18am

ModuleList will work too. You can try to visualise the graph to check if it’s what you want to build: https://discuss.pytorch.org/t/print-autograd-graph/692.