Input and Target size mismatch in RNNs

I am trying to implement a univariate time series model using vanilla RNNs. Following is the error I encounter:

/home/ggmu/anaconda3/envs/toothless/lib/python3.6/site-packages/torch/nn/modules/loss.py:443: UserWarning: Using a target size (torch.Size([5, 1, 1])) that is different to the input size (torch.Size([5, 10, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  return F.mse_loss(input, target, reduction=self.reduction)

Input size = (90, 10)
output size = (90, 1)

Config:

batch_size = 5
input_size = 1
sequence_length = 10
hidden_size = 1
num_layer = 3

The RNN class:

class ModelRnn(nn.Module):
    def __init__(self):
        super(ModelRnn, self).__init__()
        
        self.rnn = nn.RNN(input_size=input_size, hidden_size=hidden_size, batch_first=True, num_layers=num_layer)
        
    def forward(self, x, hidden):
        hidden = torch.zeros(num_layer, batch_size, hidden_size)

        x = x.view(batch_size, sequence_length, input_size)
        out, hidden = self.rnn(x, hidden)
        
        return hidden, out

The data loader:

train_data = trainData(torch.FloatTensor(X_train), torch.FloatTensor(y_train))
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=False, num_workers=2)
model = ModelRnn()
print(model)

#    ModelRnn(
#    (rnn): RNN(1, 1, num_layers=3, batch_first=True)
#   )

Optimizer:

optimizer=torch.optim.Adam(model.parameters(),lr=0.01)
criterion=nn.MSELoss()

Train loop:

for epoch in range(20):
    for x_batch, y_batch in train_loader:
        optimizer.zero_grad()
        hidden, output = model(x_batch, hidden)
        y_batch = y_batch.view(-1, 1, 1)
        loss = criterion(output,  y_batch)
        loss.backward()
        optimizer.step()
        
    print(f"{epoch+1} epoch | loss = {loss}")

Following is the output + error:

1 epoch | loss = 9323.71484375

/home/ggmu/anaconda3/envs/toothless/lib/python3.6/site-packages/torch/nn/modules/loss.py:443: UserWarning: Using a target size (torch.Size([5, 1, 1])) that is different to the input size (torch.Size([5, 10, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  return F.mse_loss(input, target, reduction=self.reduction)

2 epoch | loss = 9261.478515625
3 epoch | loss = 9235.8857421875
4 epoch | loss = 9227.794921875
5 epoch | loss = 9224.625
6 epoch | loss = 9222.990234375
7 epoch | loss = 9221.98046875
8 epoch | loss = 9221.2890625
9 epoch | loss = 9220.783203125
10 epoch | loss = 9220.400390625
11 epoch | loss = 9220.099609375
12 epoch | loss = 9219.857421875
13 epoch | loss = 9219.6591796875
14 epoch | loss = 9219.4951171875
15 epoch | loss = 9219.3564453125
16 epoch | loss = 9219.23828125
17 epoch | loss = 9219.1357421875
18 epoch | loss = 9219.0478515625
19 epoch | loss = 9218.9697265625
20 epoch | loss = 9218.9013671875

I can’t figure out the problem. The target size should be different from the input size. Because a sequence of 10 inputs give out 1 output. So, target size = (torch.Size([5, 1, 1])) should be different from input size = (torch.Size([5, 10, 1])).

Hey @scarecrow21,

The target size should be different from the input size. Because a sequence of 10 inputs give out 1 output. So, target size = (torch.Size([5, 1, 1])) should be different from input size = (torch.Size([5, 10, 1])).

I think the “input” the error you are reporting is referring to is the input of the F.mse_loss function, which in this case is the output of your RNN, not the input you are giving to the network.
There is probably a mismatch between the size of the output of your RNN and of the target you are giving to the loss function.
Following the docs, the output of a RNN will consist of two tensors [output, h_n], the first with shape (seq_len, batch, num_directions * hidden_size), the second with shape (num_layers * num_directions, batch, hidden_size).
So, it is not true that a sequence of 10 inputs gives out 1 output: your output will contain the output features (h_t) from the last layer of the RNN, for each t. It would be something like [seq_len=10, batch_size=5, hidden_size=1], using the parameters of your configuration.
Now, take a look at the forward function of your ModelRNN:

def forward(self, x, hidden):
        hidden = torch.zeros(num_layer, batch_size, hidden_size)

        x = x.view(batch_size, sequence_length, input_size)
        out, hidden = self.rnn(x, hidden)
        
        return hidden, out

Since you “swap” the hidden and out tensors of position before returning, according to your configuration the tensors hidden, output = model(x_batch, hidden) in the for cycle of the training, would have as shapes [num_layers*num_directions = 3, batch_size=5, hidden_size=1] and [batch_size=5, seq_len=10, num_directions*hidden_size=1] (since you instantiated your RNN with batch_first=True).
When you compute the loss, you are doing it therefore between two tensors one of shape [5, 10, 1], and one of shape [5, 1, 1].
In poor words, you are computing the loss between the whole sequence of output for each t and the last element of the sequence of the target.
If you are interested only in the last element of the sequence for each batch, I would suggest you to modify the forward function in order to return only the last element of the sequence, or to manually modify the tensor before you compute the loss.

Hope everything is clear!

Cheers :slight_smile:

1 Like

I changed the output variable in the training loop before passing it in the loss function.
output = output[:, -1, :]

This selects the last element of the sequence in each batch.
I don’t get the size mismatch error anymore.

Thanks a lot! :smiley:

1 Like

You’re welcome buddy :wink:

Cheers :slight_smile: