ValueError: Expected input batch_size (48) to match target batch_size (12)

Jishan · June 14, 2020, 5:38pm

Probably due to my unclear understanding of the forward pass function, I have made a mistake in the following code:

# data loader parameters
training_loader = torch.utils.data.DataLoader(training_data,
                                           batch_size=12, 
                                           num_workers=0,
                                           shuffle=True)


class Net(nn.Module):
    ### TODO: choose an architecture, and complete the class
    def __init__(self):
        super(Net, self).__init__()
        ## Define layers of a CNN

        total_dog_classes = 133

        self.conv1 = nn.Conv2d(3, 32, 3, padding = 1)    
        self.norm2d1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, 3, padding = 1) 
        self.conv3 = nn.Conv2d(64, 128, 3, padding = 1)  
        self.conv4 = nn.Conv2d(128, 256, 3, padding = 1)

        ## Max Pooling Layer
        self.pool = nn.MaxPool2d(2, 2)

        ## Droput Layer
        self.dropout = nn.Dropout(0.2)

        ## Linear Layer
        self.fc1 = nn.Linear(256 * 7 * 7, 512)
        self.fc2 = nn.Linear(512, total_dog_classes)


    def forward(self, x):
        ## Define forward behavior
        x = self.pool(F.relu(self.norm2d1(self.conv1(x))))
        print(x.shape)

        x = self.pool(F.relu(self.conv2(x)))
        print(x.shape)

        x = self.pool(F.relu(self.conv3(x)))
        print(x.shape)

        x = self.pool(F.relu(self.conv4(x)))
        print(x.shape)

        # Flatten Image and Add Dropout Layer
        x = self.dropout(x.view(-1, 256 * 7 * 7))

        # Add Second Hidden Layer
        x = self.dropout(F.relu(self.fc1(x)))
        x = self.fc2(x)
        return x

and here is the printed output of shapes:

torch.Size([12, 32, 112, 112])
torch.Size([12, 64, 56, 56])
torch.Size([12, 128, 28, 28])
torch.Size([12, 256, 14, 14])
torch.Size([48, 133])

Leading to ValueError: Expected input batch_size (48) to match target batch_size (12). I would really be helped if the calculations are also explained. Thanks!

Nikronic · June 14, 2020, 6:06pm

Hi,

Your model definition is correct and its output has the shape of [batch_size, 113]:

x = torch.randn(12, 3, 120, 120)
Net()(x).shape

Could you explain how did you print this values? in which line?

Also, I think error is from somewhere else not the model. Could you share the stacktrace error?

Bests

Jishan · June 14, 2020, 6:08pm

Hi, sorry, I had not posted how I got the output: This is where I printed (full code below):

 ## Forward Pass
            output = model(data)
            print(output.shape) # <<<<<<  PRINTEDHERE

def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
    """returns trained model"""
    # initialize tracker for minimum validation loss
    valid_loss_min = np.Inf 
    
    for epoch in range(1, n_epochs+1):
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
        
        ###################
        # train the model #
        ###################
        model.train()
        for batch_idx, (data, target) in enumerate(loaders['train']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            ## find the loss and update the model parameters accordingly
            ## record the average training loss, using something like
            ## train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
            
            ## Gradient Clearnece for Variables ALready Optimized
            optimizer.zero_grad()
            
            ## Forward Pass
            output = model(data)
            print(output.shape) # <<<<<<  PRINTEDHERE

            ## Calculate the Batch Loss
            loss = criterion(output, target)
            
            ## Backward Pass
            loss.backward()
            
            ## Optimization Step (1)
            optimizer.step()
            
            ## Traing Loss Recalculation - as above
            train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
            
        ######################    
        # validate the model #
        ######################
        model.eval()
        for batch_idx, (data, target) in enumerate(loaders['valid']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            ## update the average validation loss
            ## forward pass
            output = model(data)
            
            ## Calculate the Batch Loss
            loss = criterion(output, target)
            
            ## Validation Loss Calculation
            valid_loss = valid_loss + ((1 / (batch_idx + 1)) * (loss.data - valid_loss))
            
            
        # print training/validation statistics 
        print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
            epoch, 
            train_loss,
            valid_loss
            ))
        
        ## TODO: save the model if validation loss has decreased
        if valid_loss <= valid_loss_min:
            print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
            valid_loss_min,
            valid_loss))
            torch.save(model.state_dict(), save_path)
            valid_loss_min = valid_loss
            
    # return trained model
    return model


# train the model
model_scratch = train(12, loaders_scratch, model_scratch, optimizer_scratch, 
                      criterion_scratch, use_cuda, 'model_scratch.pt')

# load the model that got the best validation accuracy
model_scratch.load_state_dict(torch.load('model_scratch.pt'))

The full stack trace:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-28-7e7db3f74a6e> in <module>
     83 
     84 # train the model
---> 85 model_scratch = train(60, loaders_scratch, model_scratch, optimizer_scratch, 
     86                       criterion_scratch, use_cuda, 'model_scratch.pt')
     87 

<ipython-input-28-7e7db3f74a6e> in train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path)
     33 
     34             ## Calculate the Batch Loss
---> 35             loss = criterion(output, target)
     36 
     37             ## Backward Pass

~\.conda\envs\deep-learning\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

~\.conda\envs\deep-learning\lib\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
    929 
    930     def forward(self, input, target):
--> 931         return F.cross_entropy(input, target, weight=self.weight,
    932                                ignore_index=self.ignore_index, reduction=self.reduction)
    933 

~\.conda\envs\deep-learning\lib\site-packages\torch\nn\functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
   2315     if size_average is not None or reduce is not None:
   2316         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2317     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
   2318 
   2319 

~\.conda\envs\deep-learning\lib\site-packages\torch\nn\functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   2110 
   2111     if input.size(0) != target.size(0):
-> 2112         raise ValueError('Expected input batch_size ({}) to match target batch_size ({}).'
   2113                          .format(input.size(0), target.size(0)))
   2114     if dim == 2:

ValueError: Expected input batch_size (48) to match target batch_size (12).

Nikronic · June 14, 2020, 6:35pm

As I said, the issue is not from model. You have set the batch-size=48 but strangely, the target variable has different batch-size=12. The reason may lead to this, is the loader and probably made some mistakes there.
Try to run an iteration over dataloader so you can make sure the number of tensors are correct.

Something like this:

next(iter(train_loader))

PS. If you are doing Udacity’s assignment, I highly suggest going through all issues yourself as it helps you to learn how to trace back error and understand the common issues. The reason that I am mentioning this, is that I have seen codes very similar to you in this forum and I think this is not the real intention of learning. (possible implemented assignment)

Jishan · June 14, 2020, 6:37pm

Hi, you are correct - I am trying to do the assignment - but badly stuck at this model implementation :(. Nevertheless, thanks a lot for the pointer. I will research bit more and try to fix it.

Nikronic · June 14, 2020, 6:46pm

One point that I need to mention is that instead of writing codes for all sections then running model to find bugs, build small parts of code, such as only one layer or just one Dataset/DataLoader and try to interact with them for arbitrary inputs using basic function in python/PyTorch. For instance, you need to be able to extract only 1 batch of images and show them or play with lmost anything in your code. Literally, you must be able to understand the mechanism of each module you use such as model.train(), model.eval(), etc.