Expected input batch_size (1536) to match target batch_size (512)

I am making a simple recurrent neural network architecture for CIFAR10 image classification. I am getting batch size mismatch error.

Data Preprocess

all_transforms = transforms.Compose([
    transforms.Resize((32, 32)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.4914, 0.4822, 0.4465],
                         std=[0.2023, 0.1994, 0.2010])]
)

train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, transform=all_transforms, download=True)
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, transform=all_transforms, download=True)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=512, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=512, shuffle=True)

Simple RNN

class SimpleRNN(nn.Module):
    def __init__(self, input_dim, hidden_dim, num_layers, num_classes):
        super(SimpleRNN, self).__init__()
        self.num_layers = num_layers
        self.hidden_dim = hidden_dim
        self.rnn = nn.RNN(input_size=input_dim, hidden_size=hidden_dim, num_layers=num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_dim, num_classes)  # hidden dimension is output dimension

    def forward(self, x):
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim).to(device)
        out, _ = self.rnn(x, h0)
        out = out[:, -1, :]
        out = self.fc(out)
        return out
x.shape--> torch.Size([1536, 32, 32])
h0.shape--> torch.Size([2, 1536, 128])
out.shape of rnn -> torch.Size([1536, 32, 128])
out = out[:, -1, :] --> torch.Size([1536, 128])

Training Loop

for epoch in range(epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.reshape(-1, sequence_length, input_size).cuda()
        labels = labels.cuda()

        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print("Epochs [{}/{}], Loss: {:4f}".format(epoch + 1, epochs, loss.item()))
input_size = 32
hidden_size = 128
num_layers = 2
num_classes = 10
sequence_length = 32

Traceback

Traceback (most recent call last):
  File "/media/cvpr/CM_1/tutorials/pytorch_simple_rrn.py", line 62, in <module>
    loss = criterion(outputs, labels)
  File "/home/cvpr/anaconda3/envs/tutorials/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/cvpr/anaconda3/envs/tutorials/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 961, in forward
    return F.cross_entropy(input, target, weight=self.weight,
  File "/home/cvpr/anaconda3/envs/tutorials/lib/python3.8/site-packages/torch/nn/functional.py", line 2468, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/cvpr/anaconda3/envs/tutorials/lib/python3.8/site-packages/torch/nn/functional.py", line 2261, in nll_loss
    raise ValueError('Expected input batch_size ({}) to match target batch_size ({}).'
ValueError: Expected input batch_size (1536) to match target batch_size (512).

Based on your setup a batch size of 512 is expected while the input seems to have a batch size of 1536 so I guess:

images = images.reshape(-1, sequence_length, input_size)

is wrong as it seems to move the pixel dimensions to the batch dimension.
Check the values of sequence_length and input_size and make sure they are not changing the batch size.

I have also changed the batch size 512 to 1536 but it didnt work out. The input_size and sequence length is 32

This won’t work as CIFAR10 is containing images with 3 channels.
If both these values are set to 32 the batch size will increase by a factor of 3 to 512*3=1536.

How we can find out a correct value for both input and sequence?

It depends on your use case and how you want to interpret an input in the shape [batch_size, channels=3, height=32, width=32] as a temporal signal of [batch_size, sequence_length, features].
In any case, you should not change the batch dimension, as it will interleave the signal in the batch dimension and thus yield the current shape mismatch.

Do you mean that I will reshape torch.Size([1536, 32, 32]) value into [batch_size, channels=3, height=32, width=32]?

No, right now you are doing the opposite: reshaping an input of [batch_size, 3, 32, 32] (CIFAR10 images) to [1536, 32, 32] and pass this new (temporal signal) to the RNN, which is wrong since you have changed the number of samples.
You would have to check your use case and explain how you want to interpret the spatial signal (a batch of CIFAR10 images) as a temporal signal (containing a sequence and feature dimension).

I am implementing simple RNN (many-to-one) logic as define for MNIST dataset. If i simple change the reshape method in training loop images.reshape(1, sequence_length, input_size).cuda() to images.cuda() then i got an error

Traceback (most recent call last):
  File "/media/cvpr/CM_1/tutorials/pytorch_simple_rrn.py", line 63, in <module>
    outputs = model(images)
  File "/home/cvpr/anaconda3/envs/tutorials/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/media/cvpr/CM_1/tutorials/pytorch_simple_rrn.py", line 44, in forward
    out, _ = self.rnn(x, h0)
  File "/home/cvpr/anaconda3/envs/tutorials/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/cvpr/anaconda3/envs/tutorials/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 231, in forward
    self.check_forward_args(input, hx, batch_sizes)
  File "/home/cvpr/anaconda3/envs/tutorials/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 199, in check_forward_args
    self.check_input(input, batch_sizes)
  File "/home/cvpr/anaconda3/envs/tutorials/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 174, in check_input
    raise RuntimeError(
RuntimeError: input must have 3 dimensions, got 4

I am understanding error like in training loop the images got 4 dims torch.Size([32, 3, 32, 32]) but we three dimensions