I am trying to train a dual encoder LSTM model for a chatbot using PyTorch. The idea is as follows (taken from a blog tutorial):
I wrote some code in which I build a vocabulary from the training file, a dictionary to map from words to their ids, a dictionary to map from ids to the right word embedding vectors (partially initialized with GloVe embeddings).
I then defined two classes: the Encoder class defines the LSTM itself and the Dual_Encoder class applies the Encoder to both the context and response utterances.
The seems to be a problem with my forward function in the DualEncoder class though.
Here is my code snippet:
class DualEncoder(nn.Module):
def __init__(self, encoder):
super(DualEncoder, self).__init__()
self.encoder = encoder
self.number_of_layers = 1
M = torch.FloatTensor(self.encoder.hidden_size, self.encoder.hidden_size).cuda()
init.normal(M)
self.M = nn.Parameter(M, requires_grad = True)
**def forward(self, contexts, responses):**
context_out, context_hn = self.encoder(contexts)
response_out, response_hn = self.encoder(responses)
scores_list = []
y_preds = None
iter = context_hn[0].shape[0] #to iterate over 999 examples
for e in range(iter):
context_h = context_hn[0][e].view(1, self.encoder.hidden_size)
response_h = response_hn[0][e].view(self.encoder.hidden_size,1)
#gives vectors of hidden_size for each example
dot_var = torch.mm(torch.mm(context_h, self.M), response_h)[0][0]
dot_tensor = dot_var.data
dot_tensor.cuda()
score = torch.sigmoid(dot_tensor)
scores_list.append(score)
y_preds = torch.stack(scores_list).cuda()
return y_preds
According to PyTorch documentation, the dimensions of the LSTM output are:
output (seq_len, batch, hidden_size * num_directions):
tensor containing the output features (h_t) from the last layer of the RNN, for each t.
h_n (num_layers * num_directions, batch, hidden_size):
tensor containing the hidden state for t=seq_len.
However, when I try running it on Floydhub, I get the following error:
2018-01-05 07:07:42,903 INFO - Run Output:
2018-01-05 07:12:32,965 INFO - Traceback (most recent call last):
2018-01-05 07:12:32,968 INFO - File “all_scripts.py”, line 279, in
2018-01-05 07:12:32,968 INFO - y_preds = dual_encoder(context_matrix, response_matrix)
2018-01-05 07:12:32,969 INFO - File “/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 325, in call
2018-01-05 07:12:32,969 INFO - result = self.forward(*input, **kwargs)
2018-01-05 07:12:32,969 INFO - File “all_scripts.py”, line 222, in forward
2018-01-05 07:12:32,969 INFO - context_h = context_hn[0][e].view(1, self.encoder.hidden_size)
2018-01-05 07:12:32,969 INFO - RuntimeError: invalid argument 2: size ‘[1 x 300]’ is invalid for input with 299700 elements at /pytorch/torch/lib/TH/THStorage.c:41
(I use 999 training examples right now and my hidden_size is 300, that makes 999*300 = 299700.) But why does it not take just one example for each iteration of the loop?
I played around a bit with the dimensions and the slicing, but no success…I find it highly confusing since I wrote a bit of dummy code to check whether the general approach within that for-loop works, using random tensors with the same dimensions as the LSTM output should have. And in fact, the dummy code works perfectly.
Please help me