Model.eval() and model.param.required_grad=False

Alva-2020 · March 23, 2023, 1:24pm

Hi, I encountered a strange problem: when I set model.eval() in evaluation stage and extractor bottleneck feature from audio. The extracted embedding are all [Nan], but when I set model. train(). The embeddings are normal numbers. If I set model.train() and model.param.required_grad =False, are the inference results accuracy? Thanks very much for your help

srishti-git1110 · March 23, 2023, 1:47pm

Could you please post an executable snippet that’d reproduce this error?

Certain layers like batchnorm etc. work differently in train mode, so it’s best to use eval for inferencing.

soulitzer · March 23, 2023, 1:49pm

It depends on what layers you have in your model. Typically:
If you have any dropout layers in your model, setting model.eval() will disable the masking.
If you have any batch norm layers, your model will use the running stats instead of statistics from the current input.

You most likely want to debug why eval mode is failing for your model, so I’d second srishti’s request for a self-contained code snippet to reproduce the issue.

Alva-2020 · March 25, 2023, 10:01am

Tanks for your reply. I found the problem occurs in BacthNorm1d line. ` self.conv = Conv1d(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
dilation=dilation,
)
self.activation = activation()
self.norm = BatchNorm1d(input_size=out_channels)

def forward(self, x):
    pdb.set_trace()
    for name, param in self.norm.named_parameters():
        print("name----->",name)
        print("params---->",param)
    pdb.set_trace()
    return self.norm(self.activation(self.conv(x)))

text`
The outputs of self.conv(x) and self.activation() are ok. The params ‘weights’ and ‘bias’ in self.norm is ok, but any tensor go through the self.norm(x) will become Nan. When I set track_running_stats=False, the self.norm(x) go well.

Alva-2020 · March 25, 2023, 10:05am

I have another questions, If I set the track_running_stats of torch.nn.BatchNorm1d() to False in the training phase model.train() and the test phase model.eval(), will it reduce the system performance?

soulitzer · March 25, 2023, 1:42pm

During eval the accuracy of your model can suffer if for example during evaluation you are passing in input with batch size of 1 because the activations are no longer normalized. It may not matter if you are using large batch sizes during evaluation.

Alva-2020 · March 25, 2023, 3:07pm

Alva-2020:

I found the problem occurs in BacthNorm1d line. ` self.conv = Conv1d(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
dilation=dilation,
)
self.activation = activation()
self.norm = BatchNorm1d(input_size=out_channels)
def forward(self, x):
    pdb.set_trace()
    for name, param in self.norm.named_parameters():
        print("name----->",name)
        print("params---->",param)
    pdb.set_trace()
    return self.norm(self.activation(self.conv(x)))
text`
The outputs of self.conv(x) and self.activation() are ok. The params ‘weights’ and ‘bias’ in self.norm is ok, but any tensor go through the self.norm(x) will become Nan. When I set track_running_stats=False, the self.norm(x) go well.

Thank you. I got it.