InceptionV3 activation function for testing?

I’m doing an image classification problem (binary for the moment but it will scale to 3 or 4 classes later on) with InceptionV3.
When training I use the raw output (logits) of the classifier, which is a FC with 2 neurons output. My loss function is a Focal Loss, based on BCE with Logits Loss, due to high class imbalance. It seems to work pretty well, although there is some overfitting.

When testing the saved model with non-seen data, my metrics varies a lot depending on the batch size used for testing. There is clearly something wrong but I cannot find it. I have two ideas:

  1. I’m doing something wrong when testing the network, as it probably should have an activation function. A sigmoid or a softmax and then thresholding?
  2. Normalizing/Data augmentation: I’m normalizing my data based on the mean and std deviation of the whole dataset, both for training and testing. And then I perform data augmentation on the training data but not on the testing. I think this is correct. I’ve seen in this forum that the batch normalization layers may be affecting the results when testing, but it should not as I’m setting the model on eval mode.

Did you forget to call model.eval() before starting the evaluation?
This could indicate that e.g. batchnorm layers are still in their training mode and are normalizing the input activation using the batch stats.

No. I was calling model.eval() from the beggining.
I’ve found the bug. I was computing the AUROC per batch and then averaging it on every epoch. Due to high class imbalance some of the batches had only negative values.

Is it necessary to set track_running_stats = False for every BatchNorm2 layer?

No, it’s not necessary since this would disable running stats and your inference outputs would then also depend on the batch size.