Hello,
I am trying to re-implement an answer retrieval system, previouly implemented in Theano (Sequential Matching Network (SMN)), into pytorch.
The forward method of the graph I implemented looks like this:
def forward(self, context_utts, response_utt, num_context_utt=10):
#Embeddings encoding
embeddings_context=[]
for i in range(num_context_utt):
embeddings_context.append(self.embedding_layer(context_utts[i]))
embeddings_response=self.embedding_layer(response_utt)
#GRU encoding
gru_hidden_context=[]
for i in range(num_context_utt):
#packed_emb = embeddings_context[i]
# if lengths is not None and not self.no_pack_padded_seq:
# Lengths data is wrapped inside a Variable.
#lengths = lengths.view(-1).tolist()
#packed_emb = pack(emb, lengths)
outputs, _ = self.RNN_first_layer(embeddings_context[i], None)
# if lengths is not None and not self.no_pack_padded_seq:
#outputs = unpack(outputs)[0]
gru_hidden_context.append(outputs)
#Response
outputs, _ = self.RNN_first_layer(embeddings_response, None)
gru_hidden_response=outputs
#Convolution layer
convolution_output=[]
for i in range(num_context_utt):
convolution_output.append(self.Convolution(embeddings_context[i],embeddings_response,gru_hidden_context[i],gru_hidden_response))
#Stack together the convolution_output
convolution_output=torch.stack(convolution_output,1)
#GRU final layer
outputs, _ =self.RNN_final_layer(convolution_output,None)
#SMN_last (so only pick the last of the hidden states in the output
output=outputs[:,-1,:]
#Logistic regression
pred_prob, pred_class=self.LogisticRegression(output)
return pred_prob, pred_class
The code runs fine, the NLLL loss is properly calculated after and I can see that the optimizer is updating the values of the parameters after computing the gradient. The problem is that the code is a little bit slow, and I think it is mainly due to the fact that I am using iterators in the forward pass. “gru hidden context” and “convolution output” are the slowest operations of this forward pass.
I though to change the into generators. The code now looks this way:
def forward(self, context_utts, response_utt, num_context_utt=10):
#Embeddings encoding
embeddings_context=[]
for i in range(num_context_utt):
embeddings_context.append(self.embedding_layer(context_utts[i]))
embeddings_response=self.embedding_layer(response_utt)
#GRU encoding
gru_hidden_context=(self.RNN_first_layer(embeddings_context[i], None) for i in range(num_context_utt))
#Response
outputs, _ = self.RNN_first_layer(embeddings_response, None)
gru_hidden_response=outputs
#Convolution layer
convolution_output=(self.Convolution(embeddings_context[i],embeddings_response,gru_hidden_context.__next__(),gru_hidden_response) for i in range(num_context_utt))
#Stack together the convolution_output
convolution_output=torch.stack(convolution_output,1)
#GRU final layer
outputs, _ =self.RNN_final_layer(convolution_output,None)
#SMN_last (so only pick the last of the hidden states in the output
output=outputs[:,-1,:]
#Logistic regression
pred_prob, pred_class=self.LogisticRegression(output)
return pred_prob, pred_class
Now both loops go much faster, but I have a new problem:
TypeError: stack(): argument 'tensors' (position 1) must be a tuple of Tensors, not generator
I want to keep my code fast. Is there a way of stacking the elements of a generator?
thanks in advance