Batchnorm throwing error after running fine

Jordan_Howell · June 29, 2020, 6:35pm

Hello,

I’ve had a model running without error. I increased my training data size and now, I’m getting the below error.

The error shows up after the last epoch is ran on the training data.

I read to change torch.no_grad() to roof_model.eval() but that didn’t seem to help.

Traceback (most recent call last):

  File "<ipython-input-70-5d1647d029f8>", line 33, in <module>
    y_pred = roof_model(images, num_tensor, col_tensor)

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)

  File "<ipython-input-69-2bc68a827381>", line 31, in forward
    x = self.fc2_b(x)

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\torch\nn\modules\batchnorm.py", line 107, in forward
    exponential_average_factor, self.eps)

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\torch\nn\functional.py", line 1666, in batch_norm
    raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512])

Here is my model:


class Image_Model(nn.Module):
    def __init__(self, embedding_size):
        super().__init__()
        self.all_embeddings = nn.ModuleList([nn.Embedding(categories,
                                                          size) 
                                             for categories,
                                             size in embedding_size])
        n_emb = sum([y for x, y in embedding_size])
        self.n_emb = n_emb
        self.embedding_dropout = nn.Dropout(p = 0.04)
        self.cnn = models.resnet50(pretrained=True)
        for param in self.cnn.parameters():
            param_requires_grad = False

        self.fc2 = nn.Sequential(nn.Linear(1042, 512))
        self.fc2_b = nn.BatchNorm1d(512)
        self.fc3 = nn.Dropout(p=.04)
        self.fc4 = nn.Sequential(nn.Linear(512, 256))
        self.fc4_b = nn.BatchNorm1d(256)
        self.fc5 = nn.Dropout(p = 0.04)
        self.fc6 = nn.Sequential(nn.Linear(256, 2))
        
    def forward(self, image, numerical_columns, cat_columns):
        x = self.cnn(image)  
        e = [emb_layer(cat_columns[:,i]) for i,
             emb_layer in enumerate(self.all_embeddings)]
        e = torch.cat(e, 1)
        x = torch.cat((x, numerical_columns), dim = 1)
        x = torch.cat((x, e), dim = 1)
        x = F.relu(self.fc2(x))
        x = self.fc2_b(x)
        x = F.relu(self.fc3(x))
        x = F.relu(self.fc4(x))
        x = self.fc4_b(x)
        x = F.relu(self.fc5(x))
        x = F.relu(self.fc6(x))
        x = F.log_softmax(x)
        return x

# Loss and optimizer
torch.manual_seed(1010)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
roof_model = Image_Model(embeddings).to(device)
criterion = torch.nn.NLLLoss().to(device)
optimizer = torch.optim.Adam(roof_model.parameters(), lr=0.001)
scheduler = ReduceLROnPlateau(optimizer, 'min', patience = 2, verbose = True,
                              min_lr = .0000000001)

Both test and train are running in batches of 10. I’m not sure what could have changed this.

ptrblck · June 30, 2020, 9:24am

This error is raised, if a batchnorm layer cannot calculate the running statistics from a scalar value.
Usually you would see it during training (model.train()) e.g. if the batch contains a single sample.
This might be the case, if the length of your dataset is not divisible by the batch size without a remainder, which would create a smaller batch in the last iteration.
If that’s the case, you could avoid it by dropping this potentially smaller batch using drop_last=True in your DataLoader.