Accessing the dimensions of LSTM output

I am trying to train a dual encoder LSTM model for a chatbot using PyTorch. The idea is as follows (taken from a blog tutorial):

I wrote some code in which I build a vocabulary from the training file, a dictionary to map from words to their ids, a dictionary to map from ids to the right word embedding vectors (partially initialized with GloVe embeddings).

I then defined two classes: the Encoder class defines the LSTM itself and the Dual_Encoder class applies the Encoder to both the context and response utterances.

The seems to be a problem with my forward function in the DualEncoder class though.
Here is my code snippet:

class DualEncoder(nn.Module):

def __init__(self, encoder):
     super(DualEncoder, self).__init__()
     self.encoder = encoder
     self.number_of_layers = 1
     M = torch.FloatTensor(self.encoder.hidden_size, self.encoder.hidden_size).cuda()
     self.M = nn.Parameter(M, requires_grad = True)

**def forward(self, contexts, responses):**
    context_out, context_hn = self.encoder(contexts)
    response_out, response_hn = self.encoder(responses)
    scores_list = []
    y_preds = None
    iter = context_hn[0].shape[0] #to iterate over 999 examples

    for e in range(iter): 
        context_h = context_hn[0][e].view(1, self.encoder.hidden_size)
        response_h = response_hn[0][e].view(self.encoder.hidden_size,1)
        #gives vectors of hidden_size for each example
        dot_var =, self.M), response_h)[0][0]

        dot_tensor =
        score = torch.sigmoid(dot_tensor)
    y_preds = torch.stack(scores_list).cuda()      
    return y_preds

According to PyTorch documentation, the dimensions of the LSTM output are:

output (seq_len, batch, hidden_size * num_directions):
tensor containing the output features (h_t) from the last layer of the RNN, for each t.

h_n (num_layers * num_directions, batch, hidden_size):
tensor containing the hidden state for t=seq_len.

However, when I try running it on Floydhub, I get the following error:

2018-01-05 07:07:42,903 INFO - Run Output:
2018-01-05 07:12:32,965 INFO - Traceback (most recent call last):
2018-01-05 07:12:32,968 INFO - File “”, line 279, in
2018-01-05 07:12:32,968 INFO - y_preds = dual_encoder(context_matrix, response_matrix)
2018-01-05 07:12:32,969 INFO - File “/usr/local/lib/python3.6/site-packages/torch/nn/modules/”, line 325, in call
2018-01-05 07:12:32,969 INFO - result = self.forward(*input, **kwargs)
2018-01-05 07:12:32,969 INFO - File “”, line 222, in forward
2018-01-05 07:12:32,969 INFO - context_h = context_hn[0][e].view(1, self.encoder.hidden_size)
2018-01-05 07:12:32,969 INFO - RuntimeError: invalid argument 2: size ‘[1 x 300]’ is invalid for input with 299700 elements at /pytorch/torch/lib/TH/THStorage.c:41

(I use 999 training examples right now and my hidden_size is 300, that makes 999*300 = 299700.) But why does it not take just one example for each iteration of the loop?

I played around a bit with the dimensions and the slicing, but no success…I find it highly confusing since I wrote a bit of dummy code to check whether the general approach within that for-loop works, using random tensors with the same dimensions as the LSTM output should have. And in fact, the dummy code works perfectly.

Please help me :smile: