configs = [3, 96, 256, 384, 384, 256]
self.featureExtract = nn.Sequential( #271 127
nn.Conv2d(configs[0], configs[1] , kernel_size=11, stride=2), #131 59
nn.BatchNorm2d(configs[1]),
nn.MaxPool2d(kernel_size=3, stride=2), #65 29
nn.ReLU(inplace=True),
nn.Conv2d(configs[1], configs[2], kernel_size=5), #61 25
nn.BatchNorm2d(configs[2]),
nn.MaxPool2d(kernel_size=3, stride=2), #30 12
nn.ReLU(inplace=True),
nn.Conv2d(configs[2], configs[3], kernel_size=3), #28 10
nn.BatchNorm2d(configs[3]),
nn.ReLU(inplace=True),
nn.Conv2d(configs[3], configs[4], kernel_size=3), #26 8 11
nn.BatchNorm2d(configs[4]),
nn.ReLU(inplace=True),
nn.Conv2d(configs[4], configs[5], kernel_size=3), #24 6
nn.BatchNorm2d(configs[5])
)
Hey, guys, recently when I train my network, I use batch size =1 due to special reasons. After training for a while, I realize that batch size 1 with Batch norm may cause problem. BUT, when training, the training accuracy is still fine, why?
And , when evaluate the model, if I set model.eval() then model just output nan
or -inf
, why is this? should I remove all the batch norm since I just use 1 for batch size?
PLUS, when evaluate, if I still use model.train() to evaluate, the result is still fine, amazing, why?
Also, I have heard that when batch size is 1, its actually instance Norm…