RNN with input and target shapes don't match

I’m trying to predict one-step ahead by using 4 time steps in the past (lag = 4). So, my input_size is (1,4,1), one batch, 4 time steps and 1 input. The target size is (1,1) cause I just need to predict one-step ahead.

It gives me the following error:

RuntimeError: input and target shapes do not match: input [4 x 1], target [1] at /pytorch/aten/src/THNN/generic/MSECriterion.c:12

Here is my code:

class RNN(nn.Module):
    def __init__(self, input_size, output_size, hidden_dim, n_layers):
        super(RNN, self).__init__()

        # define an RNN with specified parameters
        # batch_first means that the first dim of the input and output will be the batch_size
        self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=False)
        # last, fully-connected layer
        self.fc = nn.Linear(hidden_dim, output_size)

    def forward(self, x, hidden):
        # x (batch_size, seq_length, input_size)
        # hidden (n_layers, batch_size, hidden_dim)
        # r_out (batch_size, time_step, output_size)
        batch_size = x.size(0)
        # get RNN outputs
        r_out, hidden = self.rnn(x, hidden)
        # shape output to be (batch_size*seq_length, hidden_dim)
        r_out = r_out.view(-1, self.hidden_dim)  
        # get final output 
        output = self.fc(r_out)
        return output, hidden


# instantiate an RNN
rnn = RNN(input_size, output_size, hidden_dim, n_layers)

# train the RNN
def train(rnn, n_epoch, train):
    for e in range(n_epoch):
        # initialize the hidden state
        hidden = None 
        train_loss = 0
        for i in range(len(train)):
            x, y = train[i, 0:-1], train[i, -1]
            x = x.reshape(1,x.shape[0])      
            y = np.reshape(1,1)
            # convert data into Tensors
            x_tensor = torch.Tensor(x).unsqueeze(0) # unsqueeze gives a 1, batch_size dimension
            y_tensor = torch.Tensor(y)

            # outputs from the rnn
            prediction, hidden = rnn(x_tensor, hidden)

            ## Representing Memory ##
            # make a new variable for hidden and detach the hidden state from its history
            # this way, we don't backpropagate through the entire history
            hidden = hidden.data

            # calculate the loss
            loss = criterion(prediction, y_tensor)
            # zero gradients
            # perform backprop and update weights
            train_loss += loss.item()
         # calculate average loss over an epoch
        train_loss = train_loss/len(train)
        print('Epoch: {} \tTraining Loss: {:.6f}'.format(
    return rnn

trained_rnn = train(rnn, 10, train_scaled)

1 Like

First problem, the most important problem of your code:

r_out = r_out.view(-1, self.hidden_dim)

You need to know the output of RNN, which is (batch, seq_len, num_directions * hidden_size) for batch_first=True. Pytorch stacks the output for each step, so you see the second dim of the output is seq_len. When you predict 1 step ahead, you don’t need all the output of RNN. What you really need from r_out is r_out[ :, -1, : ]. The alternative is to use a combination of the outputs of all the steps. One classic method to do that is using attention mechanism.


x_tensor = torch.Tensor(x).unsqueeze(0) # unsqueeze gives a 1, batch_size dimension

The 1st dim of x_tensor is batch_size, but in your RNN

self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=False)

You set batch_first=False


Thanks a lot, @Hong, so to predict one-step ahead I just need the last output in the sequence, and I also set batch_first = True

Hi @BlueWolf90. From the programming perspective, when you do

r_out = r_out.view(-1, self.hidden_dim)

you are concatenating the output of all steps, which causes the shape mismatch.
From the project’s perspective, the last step output encodes the “information” from previous steps, which is sufficient to predict the next step for most of applications. However, for some applications, it might not be the case. I believe it is a good practice to try using a weighted sum of all previous steps’ outputs (this is a simple way to apply attention mechanism). And use validation dataset to find out which is better.