"RuntimeError: input must have 3 dimensions, got 2" with LSTM model

beneyal · November 19, 2018, 9:54am

Hi,

I’m trying to train an LSTM for language modeling using bigrams. I managed to get through the training (I don’t know if it was any good), but now I can’t use the model for inference.

Code for the dataset:

class RNNIndexLMDataset(data.Dataset):
    def __init__(self, train):
        self._bigrams = list(nltk.bigrams(train))
    
    def __len__(self):
        return len(self._bigrams)
    
    def __getitem__(self, index):
        w1, w2 = self._bigrams[index]
        w1 = torch.LongTensor([w2i[w1]]).to(device)
        w2 = torch.LongTensor([w2i[w2]]).to(device)
        return w1, w2

Code for the model:

class RecurrentLM(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_size, num_layers):
        super(RecurrentLM, self).__init__()
        self.embedding_dim = embedding_dim
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_size, num_layers=self.num_layers)
        self.fc = nn.Linear(hidden_size, vocab_size)
        self.softmax = nn.LogSoftmax(dim=1)
    
    def forward(self, input):
        out = self.embedding(input)
        hidden = self._init_hidden()
        out = out.squeeze().unsqueeze(dim=0)  # from (128, 1, 300) to (1, 128, 300) = (seqlen, batch, input)
        out, hidden = self.lstm(out, hidden)
        out = self.fc(out)
        out = self.softmax(out)
        return out
    
    def _init_hidden(self):
        return (torch.zeros(self.num_layers, BATCH_SIZE, self.hidden_size).to(device),
                torch.zeros(self.num_layers, BATCH_SIZE, self.hidden_size).to(device))

Code for the training:

dataset = RNNIndexLMDataset(train)
data_loader = data.DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True, drop_last=True)
losses = []
loss_fn = nn.NLLLoss()
model = RecurrentLM(VOCAB_SIZE, EMBEDDING_DIM, NUM_HIDDEN, num_layers=1).cuda()
optimizer = optim.Adam(model.parameters())
epochs = 10
for _ in range(epochs):
    total_loss = 0
    for batch in data_loader:
        w, y = batch
        model.zero_grad()
        yhat = model(w)
        loss = loss_fn(yhat.squeeze(), y.squeeze())
        loss.backward()
        optimizer.step()
        total_loss += loss
    print('Loss: %.4f' % (total_loss.item()), end='\r')
    losses.append(total_loss)
RNNLM = model
plt.plot(losses);

and finally, this is what crashes:

w = torch.LongTensor([[w2i['with']]]).to(device)
RNNLM(w.view(1, 1, -1))

Thanks!

ptrblck · November 19, 2018, 3:49pm

I think your way to permute the dimensions might not work, if you provide a single sample with a batch size of 1.
Currently you are squeezing and unsqueezing the output of your embedding to change the dimensions to [seqlen, batch, input].
However, if your batch size is only 1, you’ll squeeze this dimension also.
A better approach would be to use out = out.permute(1, 0, 2) and call contiguous() on it if necessary.
Could you try that and see, if it’s working?

beneyal · November 19, 2018, 6:17pm

It doesn’t seem to be working. It works if I use batches, but when I try to use a “batch” of size 1, it’s like the dimensions are flipped, and then the permute breaks the dimensionality again. Maybe I’m trying to use size 1 batche incorrectly?

ptrblck · November 19, 2018, 6:21pm

Could you print the shape after the permute call?
It should be [1, 1, 300].

beneyal · November 19, 2018, 6:27pm

(where it says wi.shape it’s actually w.shape)

ptrblck · November 19, 2018, 7:38pm

It looks like you are not using the DataLoader for your single sample, thus the batch dimension is missing there. Try to use w = w.unsqueeze(0) and pass it to the model.

beneyal · November 19, 2018, 7:39pm

Bingo. Now I’m not getting an error, but a vector of zeros. Probably wrong softmax dimension. Thank you so very much, as always