Inference time on CPU varying massively with training weight choice

I’m trying to classify 128 x 313 images using EfficientNetV2. (tf_efficientnetv2_s_in21k from timm). Created from Mel Spectrograms.

I can’t understand why my inference speed is varying by a factor of up to 20, depending on which set of model weights I use. Could somebody please explain what might cause this? The relevant details I can think of sumarised below:

  • In all cases they are weights I have trained with another notebook, then loaded from checkpoint.

  • Trained, and inference with float32 normalised onto [0, 255]

  • Trained on GPU, mixed precision but inference on CPU.

  • I don’t think there are memory issues

  • The inference batch size doesn’t seem to make much difference

  • Earlier I was getting inference completed in 9 seconds per 120 image files, now it takes 380 seconds, changing nothing but the checkpoint weights.

  • Using PyTorch Lightning, and not using a dataloader for inference.

Try to use torch.set_flush_denormal(True) and check if this changes the performance (assuming you are using a x86 CPU supporting SSE3 instructions).

1 Like

This indeed was the solution, thanks @ptrblck I initially tried applying it to the inference notebook and only got a modest improvement. It was setting it on the training process that did the trick. That didn’t occur to me at first.