Some grad is None, when I call backward

jacob · August 4, 2017, 6:47am

I am trying to implement a complex hierarchical Seq2Seq to generate the multi-sentence response for dialogues. Things are all fine when I run forward() on my model. but when I call backward() on some samples, there is a
RuntimeError: sizes do not match at /b/wheel/pytorchsrc/torch/lib/THC/generated/…/generic/THCTensorMathPointwise.cu:216.

Then I try to print the grads of parameters of my model when the backward() is called successfully like:

for name,parameter in model.named_parameters():
    print name
    if parameter.grad is not None:
        print parameter.grad.size()
    else:
        print None

I got:

encoder.embedding.weight None
encoder.rnn_sent.weight_ih_l0 None
encoder.rnn_sent.weight_hh_l0 None
encoder.rnn_sent.weight_ih_l0_reverse None
encoder.rnn_sent.weight_hh_l0_reverse None
encoder.rnn_word.weight_ih_l0 None
encoder.rnn_word.weight_hh_l0 None
encoder.rnn_word.weight_ih_l0_reverse None
encoder.rnn_word.weight_hh_l0_reverse None
decoder.embedding.weight torch.Size([20001, 300])
decoder.rnn_cell_sent.weight_ih torch.Size([512, 640])
decoder.rnn_cell_sent.weight_hh torch.Size([512, 128])
decoder.rnn_word.weight_ih_l0 torch.Size([2048, 428])
decoder.rnn_word.weight_hh_l0 torch.Size([2048, 512])
decoder.attention.att_linear.weight torch.Size([1, 128])
decoder.attention.att_linear.bias torch.Size([1])
decoder.linear.weight torch.Size([20001, 512])
decoder.linear.bias torch.Size([20001])

Some of grads are valid values but others are None. I don’t know how this happened. Maybe there is a bug in backward engine.

jacob · August 4, 2017, 6:54am

The decoder’s forward function is like this: def forward(self,input,length,src_input,hidden_sent,hidden_word):

   batch_size, num_sent, num_word = input.size()
    output_list = []
    for sent_index in range(num_sent):
        #prepare the input for the word-level rnn
        indexed_sent_input, indexed_length = input[:,sent_index],length[:,sent_index]
        embedding_one_sent = self.embedding(indexed_sent_input)
        rnn_h_sent = hidden_sent[0].unsqueeze(1)
        rnn_h_sent_expand = rnn_h_sent.expand(batch_size,num_word,self.num_hidden_sent)
        word_input = torch.cat([embedding_one_sent,rnn_h_sent_expand],dim=2)
        rnn_result_word,hidden_word = self.rnn_word(word_input,hidden_word)
        rnn_result_last,rnn_result_sent = collect_last_hidden(rnn_result_word,indexed_length)
        attentioned_src_input = self.attention(src_input,rnn_h_sent)
        sent_input = torch.cat([rnn_result_last,attentioned_src_input],dim=1)
        hidden_sent = self.rnn_cell_sent(sent_input,hidden_sent)
        output_list.append(rnn_result_sent)
    output_list = [self.linear(output) for output in output_list]
    return output_list,hidden_sent,hidden_word