I am working on lung segmentations using a Fully Convolutional DenseNet that has both batchnorm and dropout layers. The model was trained using nn.DataParallel
on two GPUs. When I run inference, I load the weights after first wrapping the model in nn.DataParallel
. Inference results without the flag model.eval()
look as expected. However, when I run inference using model.eval()
, the segmentation fails and I get random clusters of pixels inside the lungs. I have checked the .training
properties for both the nn.BatchNorm2d
and nn.Dropout2d
layers in my parallel model using model.train()
and model.eval()
, and the properties look correct (True for model.train()
and False for model.eval()
):
model = nn.DataParallel(FCDN()).cuda()
model.eval()
for m in model.modules():
if isinstance(m, nn.BatchNorm2d):
print('bn', m.training) # bn False
if isinstance(m, nn.Dropout2d):
print('do', m.training) # do False
Seems the problem could be related to this: https://github.com/pytorch/pytorch/issues/1051, but I’m not sure I fully understand the post.
Since segmentations look acceptable without using model.eval()
, it is not a big issue for me, but I wonder if the results could be better if I knew how to use model.eval()
correctly.