Inference time on CPU varying massively with training weight choice

Wologman · April 17, 2023, 5:59am

I’m trying to classify 128 x 313 images using EfficientNetV2. (tf_efficientnetv2_s_in21k from timm). Created from Mel Spectrograms.

I can’t understand why my inference speed is varying by a factor of up to 20, depending on which set of model weights I use. Could somebody please explain what might cause this? The relevant details I can think of sumarised below:

In all cases they are weights I have trained with another notebook, then loaded from checkpoint.
Trained, and inference with float32 normalised onto [0, 255]
Trained on GPU, mixed precision but inference on CPU.
I don’t think there are memory issues
The inference batch size doesn’t seem to make much difference
Earlier I was getting inference completed in 9 seconds per 120 image files, now it takes 380 seconds, changing nothing but the checkpoint weights.
Using PyTorch Lightning, and not using a dataloader for inference.

ptrblck · April 17, 2023, 7:14am

Try to use torch.set_flush_denormal(True) and check if this changes the performance (assuming you are using a x86 CPU supporting SSE3 instructions).

Wologman · May 22, 2023, 1:46am

This indeed was the solution, thanks @ptrblck I initially tried applying it to the inference notebook and only got a modest improvement. It was setting it on the training process that did the trick. That didn’t occur to me at first.