I am trying to implement a complex hierarchical Seq2Seq to generate the multi-sentence response for dialogues. Things are all fine when I run forward() on my model. but when I call backward() on some samples, there is a
RuntimeError: sizes do not match at /b/wheel/pytorchsrc/torch/lib/THC/generated/…/generic/THCTensorMathPointwise.cu:216.
Then I try to print the grads of parameters of my model when the backward() is called successfully like:
for name,parameter in model.named_parameters(): print name if parameter.grad is not None: print parameter.grad.size() else: print None
I got:
encoder.embedding.weight None
encoder.rnn_sent.weight_ih_l0 None
encoder.rnn_sent.weight_hh_l0 None
encoder.rnn_sent.weight_ih_l0_reverse None
encoder.rnn_sent.weight_hh_l0_reverse None
encoder.rnn_word.weight_ih_l0 None
encoder.rnn_word.weight_hh_l0 None
encoder.rnn_word.weight_ih_l0_reverse None
encoder.rnn_word.weight_hh_l0_reverse None
decoder.embedding.weight torch.Size([20001, 300])
decoder.rnn_cell_sent.weight_ih torch.Size([512, 640])
decoder.rnn_cell_sent.weight_hh torch.Size([512, 128])
decoder.rnn_word.weight_ih_l0 torch.Size([2048, 428])
decoder.rnn_word.weight_hh_l0 torch.Size([2048, 512])
decoder.attention.att_linear.weight torch.Size([1, 128])
decoder.attention.att_linear.bias torch.Size([1])
decoder.linear.weight torch.Size([20001, 512])
decoder.linear.bias torch.Size([20001])
Some of grads are valid values but others are None. I don’t know how this happened. Maybe there is a bug in backward engine.