Cannot figure out error - RuntimeError: input must have 3 dimensions, got 2

I have searched through many stack overflow and pytorch.org forum threads about this, it seems to be an common error. However, the solutions I read in these threads are difficult to follow and I couldn’t utilise the solutions to make my code work. I do understand it has something to do with the tensor size being fed into the model. However, I’m not exactly sure how to modify the code I have to fix this. I am still new to pytorch so I cannot really understand the explanations I found that well. I am trying to use a bi-directional LSTM found on this pytorch tutorial (https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/02-intermediate/bidirectional_recurrent_neural_network/main.py). It uses the MNIST image dataset but I am trying to use a financial text dataset instead. I think this is where the problem occurs, I know text and images probably require something different, but I’m not sure what to modify. I included a function for a dataloader as well.

def dataloader(messages, labels, sequence_length=30, batch_size=32, shuffle=False):
    """ 
    Build a dataloader.
    """
    if shuffle:
        indices = list(range(len(messages)))
        random.shuffle(indices)
        messages = [messages[idx] for idx in indices]
        labels = [labels[idx] for idx in indices]

    total_sequences = len(messages) #total number of twits

    for ii in range(0, total_sequences, batch_size):
        batch_messages = messages[ii: ii+batch_size]
        
        # First initialize a tensor of all zeros
        batch = torch.zeros((sequence_length, len(batch_messages)), dtype=torch.int64)
        for batch_num, tokens in enumerate(batch_messages):
            token_tensor = torch.tensor(tokens)
            # Left pad!
            start_idx = max(sequence_length - len(token_tensor), 0) #returns 0 is len(token_tensor) > seqeuence_length
            batch[start_idx:, batch_num] = token_tensor[:sequence_length] #replace each row in batch with the token
        
        label_tensor = torch.tensor(labels[ii: ii+len(batch_messages)])
        
        yield batch, label_tensor
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Hyper-parameters
sequence_length = 28
input_size = 28
hidden_size = 128
num_layers = 2
num_classes = 3
batch_size = 100
num_epochs = 2
learning_rate = 0.003

# Bidirectional recurrent neural network (many-to-one)
class BiRNN(nn.Module):

    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(BiRNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, bidirectional=True)
        self.fc = nn.Linear(hidden_size*2, num_classes)  # 2 for bidirection
  

    def forward(self, x):
        # Set initial states
        h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device) # 2 for bidirection 
        c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)        

        # Forward propagate LSTM
        out, _ = self.lstm(x, (h0, c0))  # out: tensor of shape (batch_size, seq_length, hidden_size*2)        

        # Decode the hidden state of the last time step
        out = self.fc(out[:, -1, :])
 
        return out


model_2 = BiRNN(input_size, hidden_size, num_layers, num_classes).to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model_2.parameters(), lr=learning_rate)
train_loader = dataloader(
            train_features, train_labels, batch_size=batch_size, sequence_length=20, shuffle=True)

# Train the model
total_step = 200

for epoch in range(num_epochs):
    
    for i, (text_batch, labels) in enumerate(train_loader):
        text_batch = text_batch.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model_2(text_batch)
        loss = criterion(outputs, labels)        

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        

        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

And here is the full error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-74-8a935e97b39c> in <module>
     10 
     11         # Forward pass
---> 12         outputs = model_2(text_batch)
     13         loss = criterion(outputs, labels)
     14 

~\Anaconda3\envs\thesis\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    545             result = self._slow_forward(*input, **kwargs)
    546         else:
--> 547             result = self.forward(*input, **kwargs)
    548         for hook in self._forward_hooks.values():
    549             hook_result = hook(self, input, result)

<ipython-input-64-21fa163d5c93> in forward(self, x)
     18 
     19         # Forward propagate LSTM
---> 20         out, _ = self.lstm(x, (h0, c0))  # out: tensor of shape (batch_size, seq_length, hidden_size*2)
     21 
     22         # Decode the hidden state of the last time step

~\Anaconda3\envs\thesis\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    545             result = self._slow_forward(*input, **kwargs)
    546         else:
--> 547             result = self.forward(*input, **kwargs)
    548         for hook in self._forward_hooks.values():
    549             hook_result = hook(self, input, result)

~\Anaconda3\envs\thesis\lib\site-packages\torch\nn\modules\rnn.py in forward(self, input, hx)
    562             return self.forward_packed(input, hx)
    563         else:
--> 564             return self.forward_tensor(input, hx)
    565 
    566 class GRU(RNNBase):

~\Anaconda3\envs\thesis\lib\site-packages\torch\nn\modules\rnn.py in forward_tensor(self, input, hx)
    541         unsorted_indices = None
    542 
--> 543         output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
    544 
    545         return output, self.permute_hidden(hidden, unsorted_indices)

~\Anaconda3\envs\thesis\lib\site-packages\torch\nn\modules\rnn.py in forward_impl(self, input, hx, batch_sizes, max_batch_size, sorted_indices)
    521             hx = self.permute_hidden(hx, sorted_indices)
    522 
--> 523         self.check_forward_args(input, hx, batch_sizes)
    524         if batch_sizes is None:
    525             result = _VF.lstm(input, hx, self._get_flat_weights(), self.bias, self.num_layers,

~\Anaconda3\envs\thesis\lib\site-packages\torch\nn\modules\rnn.py in check_forward_args(self, input, hidden, batch_sizes)
    494     def check_forward_args(self, input, hidden, batch_sizes):
    495         # type: (Tensor, Tuple[Tensor, Tensor], Optional[Tensor]) -> None
--> 496         self.check_input(input, batch_sizes)
    497         expected_hidden_size = self.get_expected_hidden_size(input, batch_sizes)
    498 

~\Anaconda3\envs\thesis\lib\site-packages\torch\nn\modules\rnn.py in check_input(self, input, batch_sizes)
    143             raise RuntimeError(
    144                 'input must have {} dimensions, got {}'.format(
--> 145                     expected_input_dim, input.dim()))
    146         if self.input_size != input.size(-1):
    147             raise RuntimeError(

RuntimeError: input must have 3 dimensions, got 2

The linked code reshapes the input to:

images = images.reshape(-1, sequence_length, input_size).to(device)

, to create an input tensor of [batch_size, seq_len, nb_features].
In the MNIST example, sequence_length and input_size are both defines as 28, which will basically slice the image and fake the temporal dimension.

I’m not sure, what kind of data you are using, but you should also reshape (or load) the data in the same format.
Apparently you are passing the data as a 2-dimensional tensor (probably [batch_size, nb_features]).

Thank you, that makes sense. I will try to figure it out when I get back to my computer. So I just have to change the parameters of the reshape method like this:

images = images.reshape(batch_size, nb_features).to(device)

correct? Thanks again!

No, you would need to add a temporal dimension, as you are using an RNN afterwards.
I’m not sure how your data is defined, but I assume you are working with some kind of temporal data, which contains features for each time step?

No, there’s no temporal information in my data. It’s just sentences with a sentiment score. I processed/cleaned/tokenised the sentences and the sentiment score is also in integers so the features are just a list of numeric arrays, each array corresponding to a sentence and the labels is just a list of integers.

Edit: Just an update, I forgot that I did notice that the original author had reshaped the images tensor. However, I removed the reshape in my code. I have tried multiple ways of rewriting the code but it always comes back to needing a dimension of 3 when my input is 2. Even when I take the reshape part out completely, modify the forward pass, it still says requires 3. Here’s what I tried to do to modify the forward pass:

def forward(self, x):
        # Set initial states
        h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device) # 2 for bidirection 
        #c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)        
        print('input size', x.size())
        # Forward propagate LSTM
        #x = x.view(-1, input_size)
        out, _ = self.lstm(x, h0)  # out: tensor of shape (batch_size, seq_length, hidden_size*2)        
        print('lstm size', out.size())
        
        # Decode the hidden state of the last time step
        out = self.fc(out)
        print('lstm size reshaped final',  out.size())
        logps = self.log_softmax(out)
        return logps

I guess what I am trying to figure out is where in the code does it first require the dimensions to be 3, so I can change that. Thanks!

If you are dealing with “sequences”, it sounds like a temporal dimension.
How else are these samples ordered, if not along a timeline?
If you don’t have temporal information, I would assume you don’t need the nn.LSTM in your model and just remove the reshaping etc.

The nn.LSTM requires three dimensions as [batch_size, seq_len, nb_features] (if batch_first=True as in your code snippet).

Ah okay, yes I understand what you mean by temporal now. I tried setting batch_first=False but it still requires 3 dimensions. Is there anyway to change the model to require only 2 dimensions and if not, how would you make text data have 3 dimensions? Thanks again for all your help!