I mistakenly use BatchNorm2d with batch size 1, does this matter?

configs = [3, 96, 256, 384, 384, 256]
self.featureExtract = nn.Sequential(  #271 127
            nn.Conv2d(configs[0], configs[1] , kernel_size=11, stride=2),  #131 59
            nn.BatchNorm2d(configs[1]),
            nn.MaxPool2d(kernel_size=3, stride=2),  #65   29
            nn.ReLU(inplace=True),

            nn.Conv2d(configs[1], configs[2], kernel_size=5), #61   25
            nn.BatchNorm2d(configs[2]),
            nn.MaxPool2d(kernel_size=3, stride=2),  #30   12
            nn.ReLU(inplace=True),

            nn.Conv2d(configs[2], configs[3], kernel_size=3), #28 10
            nn.BatchNorm2d(configs[3]),
            nn.ReLU(inplace=True),

            nn.Conv2d(configs[3], configs[4], kernel_size=3), #26  8   11
            nn.BatchNorm2d(configs[4]),
            nn.ReLU(inplace=True),
            nn.Conv2d(configs[4], configs[5], kernel_size=3), #24  6
            nn.BatchNorm2d(configs[5])
        )

Hey, guys, recently when I train my network, I use batch size =1 due to special reasons. After training for a while, I realize that batch size 1 with Batch norm may cause problem. BUT, when training, the training accuracy is still fine, why?

And , when evaluate the model, if I set model.eval() then model just output nan or -inf , why is this? should I remove all the batch norm since I just use 1 for batch size?

PLUS, when evaluate, if I still use model.train() to evaluate, the result is still fine, amazing, why?

Also, I have heard that when batch size is 1, its actually instance Norm…

If you have enough features per channel so that the statistics are stable enough, that isn’t a problem. In fact, there is InstanceNorm that deliberately does this.

Best regards

Thomas