ValueError: Expected input batch_size (192) to match target batch_size (64)

i’m working with an RNN for Image Classification, my problem which i assume is that my loaded Images are 3 Channels RGB, if i only had one channel it would work i guess. Since it has 3 channels the input_batchsize is 3 times higher than the target. Any ideas how i can solve this?

I also checked this post which helped me already this far. But im stuck with the 3 channels now.

The Input Shape is torch.Size([64, 3, 224, 224])
X after permute is torch.Size([224, 192, 224])

My parameters:


  • BATCH_SIZE = 64
  • N_STEPS = 28
  • N_INPUTS = 224
  • N_CHANNELS = 3
  • N_NEURONS = 150
  • N_OUTPUTS = 21
  • N_EPOCHS = 5
  • N_PIXELS = 224
class ImageRNN(nn.Module):
    def __init__(self, batch_size, n_steps, n_inputs, n_neurons, n_outputs):
        super(ImageRNN, self).__init__()

        self.n_neurons = n_neurons
        self.batch_size = batch_size
        self.n_steps = n_steps
        self.n_inputs = n_inputs
        self.n_outputs = n_outputs

        self.basic_rnn = nn.RNN(self.n_inputs, self.n_neurons)

        self.FC = nn.Linear(self.n_neurons, self.n_outputs)

    def init_hidden(self,):
        # (num_layers, batch_size, n_neurons)
        return (torch.zeros(1, self.batch_size, self.n_neurons))

    def forward(self, X):
        # transforms X to dimensions: n_steps X batch_size X n_inputs
        X = X.permute(1, 0, 2)
        self.batch_size = X.size(1)
        self.hidden = self.init_hidden()

        lstm_out, self.hidden = self.basic_rnn(X, self.hidden)
        out = self.FC(self.hidden)

        return out.view(-1, self.n_outputs) # batch_size X n_output

    for i, data in enumerate(trainloader):
         # zero the parameter gradients

        # reset hidden states
        model.hidden = model.init_hidden()

        # get the inputs
        inputs, labels = data
        inputs = inputs.view(-1, N_PIXELS,N_PIXELS)

        # forward + backward + optimize
        outputs = model(inputs)

        loss = criterion(outputs, labels)

How do you want the size to be after .permute ?

I want the size to be 64.

Before permute it was [64,3,224,224].
After permute, [64, ?, ?] ?

based on your code, the RNN must accept size (64, 224) , how can you squash (64,3,224,224) into (64,224) ?

Is there anyway i can use the 3 channels without loosing information?

I’m not sure ny exact way, but you can flatten out X so that its shape now becomes [64,3*224*224], and change your RNNs input size to be 3*224*224.

1 Like

Thanks for the Input, i was thinking of reshaping my [64, 3, 224, 224] to a [64, 1, 672, 224]. So i just append the channels. I wouldnt loose information. What do you think about that solution, does it make sense?

Yes, if X is reshaped to something like [64,1,672,672], you would again need to arrive at a shape of [64,n_inputs] ie [64, 224] so that it can be fed into the RNN.

  1. Reshape it from [64, 3, 224, 224] to [64, 150528], apply a feed-forward layer to get the desired size for the basic_rnn layer.
  2. Apply 2 or 3 Convolutional layers and bring them to the desired shape.

The 2nd approach will learn some spatial features as well, due to the Conv layers.

Okay, what i did now is before putting my Images Batch into the Network i change
inputs = inputs.view(-1, N_N_PIXELS,N_PIXELS)
inputs = inputs.view(-1, N_CHANNELS*N_PIXELS,N_PIXELS)

which just appends all channels. Thanks again for your help @chetan_patil.

You’re welcome @TheDoctor.