Fluctuating performance of pre-trained model (frozen parameter) -- model.eval() doesn't work for dropout layer

shirui-japina · October 23, 2019, 1:54pm

I trained a model, the performance during the training is here:

I thought the parameters of epoch 44, is the best one. Then I loaded the parameters of epoch 44:

model = Model()
model.to(device=args.device)
model.load_state_dict(torch.load(args.weight_path))
criterion_reduction_none = nn.CrossEntropyLoss(reduction='none')
model.eval()

After that, I tried to predict the same data set to observe the performance of the model.
Originally, I thought that the prediction result of the model would always be the value of epoch 44.
In my opinion, it’s just like getting the value of y = f(x).
The f is just like the frozen parameter model; the x is just like the data set here.
But in fact, the performance of the model is fluctuating within a certain range like:

Why it came out like this?
Here is my model:

class Model(nn.Module):
    def forward(self, input_image):
        out = self.conv_1(input_image) # 20 * 44 * 44
        out = self.pooling_1(out) # 20 * 22 * 22

        out = self.conv_2(out) # 50 * 16 * 16
        out = self.pooling_2(out) # 50 * 8 * 8

        out = self.conv_3(out) # 500 * 2 * 2
        out = self.pooling_3(out) # 500 * 1 * 1

        out = F.dropout(out)
        out = F.relu(out)
        out = self.conv_4(out) # 2 * 1 * 1

I heard that model.eval() is needed when there is a dropout layer in the model. I actually did it.
Or is this the normal state of any deep learning model?

shirui-japina · October 24, 2019, 5:49am

Firstly, I verified the predicted output value of the model.
1% of my data set (11 samples) as verification data.

The first time predicted output:
tensor([[ 2.3544, -1.6035], [-2.8536, 2.6495], [-1.4212, 0.3854], [-0.3363, -0.2809], [ 0.3346, -0.5889], [-1.1946, 2.2831], [-0.2375, -0.7653], [-0.1411, 0.0929], [-1.5674, 2.0254], [-0.0735, 0.0934], [ 1.3332, -1.3259]], device='cuda:0', grad_fn=<SqueezeBackward0>)

The second time predicted output:
tensor([[ 1.6338, -2.4857], [-2.7193, 2.8088], [-0.8641, 0.8308], [-0.6862, -0.8741], [-0.3699, -0.3573], [-1.7963, 1.0194], [ 0.2741, -0.1906], [-0.6578, -0.2878], [-1.3425, 2.3365], [ 0.1620, 0.3339], [ 0.8572, -1.2491]], device='cuda:0', grad_fn=<SqueezeBackward0>)

The third time predicted output:
tensor([[ 1.7435, -1.8670], [-2.4884, 2.8394], [-1.0321, 1.0768], [-0.5901, -0.0761], [ 0.0477, -0.6905], [-1.3724, 0.0560], [ 0.1470, -0.4054], [ 0.0067, 0.2553], [-1.7054, 1.6145], [ 0.3581, -0.2970], [ 1.3902, -1.1217]], device='cuda:0', grad_fn=<SqueezeBackward0>)

There are even minor changes! !
How could this be?

mailcorahul · October 24, 2019, 6:14am

Are you saying that given a fixed data and a fixed model, you get different outputs on every different forward pass?
To answer the dropout part, yes model.eval() is necessary to turn off dropout during inference.

Can you see if the model is loaded in the first place and that there are no random initialization of params during every forward pass?

shirui-japina · October 24, 2019, 6:18am

Secondly, I tried to remove the dropout layer.
I mean, now the model is like:

class Model(nn.Module):
    def forward(self, input_image):
        out = self.conv_1(input_image) # 20 * 44 * 44
        out = self.pooling_1(out) # 20 * 22 * 22

        out = self.conv_2(out) # 50 * 16 * 16
        out = self.pooling_2(out) # 50 * 8 * 8

        out = self.conv_3(out) # 500 * 2 * 2
        out = self.pooling_3(out) # 500 * 1 * 1

        # out = F.dropout(out) # removed
        out = F.relu(out)
        out = self.conv_4(out) # 2 * 1 * 1

Then the phenomenon of fluctuating has completely disappeared:

But I don’t know the essential reason, because I have set model.eval()

shirui-japina · October 24, 2019, 6:38am

Thanks for your reply.

Yes, it’s incredible, but the output of the model does have small fluctuations.

Yes, but it doesn’t seem to work.

I’m sorry, I don’t know what means that the model is loaded in the first place.
I save and load the model like:

# save
path_save_model = os.path.join('model', 'weight_{epoch}.pth'.format(epoch=(epoch + 1)))
torch.save(model.state_dict(), path_save_model)

# load
model = Model()
model.to(device=args.device)
model.load_state_dict(torch.load(args.weight_path))
criterion_reduction_none = nn.CrossEntropyLoss(reduction='none')
model.eval()

The model is just a simple CNN, even there is no fully connection layer.
I’m sure there is NO random initialization of params during every forward pass.

markl · October 24, 2019, 6:51am

I don’t know what the issue would be, but I wanted to point out you don’t have to use real data for problems like this. You can do something like

zeros = torch.zeros((your_input_shape))
print(model.forward(zeros).sum())
print(model.forward(zeros).sum())
print(model.forward(zeros).sum())

If there is nondeterminacy, it should show up.

You could probably even go into your model’s forward definition, and do print(out.sum()) at multiple points, to see where the nondeterminacy is showing up. But I guess you already found out it has something to do with the dropout.

shirui-japina · October 24, 2019, 6:59am

Thanks for your reply.

zeros = torch.zeros((your_input_shape))
print(model.forward(zeros).sum())
print(model.forward(zeros).sum())
print(model.forward(zeros).sum())

I got a new PyTorch skill.
Thank you so much.

shirui-japina · October 24, 2019, 6:12pm

Maybe I got answer here.