Performance drop of trained model in production

Hello everyone,

I am pretty sure this question has been asked hundreds of times. However, I failed to find an answer to my problem.

Context: I’ve trained the ResNet18 model on 2 classes. In training 2700/3100 images per class, on validation: 680/770. The model trained well, I ended up with the accuracy of around 99% and loss around 0.01 - 0.02. Fine tuning, pretrained.

However, when we integrated the model in our system we quickly noticed it is half not as good as on validation data even though the data we’ve been feeding to the model is pretty much the same. I am 99% sure the problem is in preprocessing, however I am unable to find where I made a mistake. I’ve been staring in the code for too long, if anyone could have a quick glance please. Probably you could spot what I did wrong.

On step 1, we read N frames of a video (or just 1 photo) and load it to GPU. No preprocessing done. Just original images:

Then we have the YOLO net that finds objects for us. Using the coordinates of detected objects, we slice torch tensors on GPU and accumulate them in a list and send to our classification ResNet model. The model will need to do all necessary preprocessing because uploaded to GPU images underwent none.

The normalization function:

I do understand that I must have failed to follow the preprocessing I used during training. Which is here:

If anyone could please have a look and let me know what I did wrong because I cannot find the problem.

I’ve studied what happens with validation images under the hood in Resize(), ToTensor() and Normalize() steps. What I’m doing seems to be identical, however I am getting much more errors than on validation data, which is very very similar to what the model’s been dealing with after integration.

Thank you very much in advance.

Fun fact - when I do image preprocessing and make sure my torch tensor is RGB and not BGR (predict function, beginning of the for loop where I do premute), the model works even slightly worse. I’d expect it to work better though because it was training with RGB images. Thanks ColorJitter().


This is my guess. During training process, both dimensions of image are same. Width and height parameters of resize function were set to ‘input_size’. But, during testing, interpolation function was used with ‘size = image_size’. In this case, the smallest dimension of image is set to this value. The bigger dimension will be scaled accordingly to maintain aspect ratio. During training phase, aspect ratio of image is not maintained.


Hi @vgsprasad,

Thanks for your reply. Sorry, the screenshot I attached is a bit misleading. The value of image size is actually (224, 224). So, it is the same as during training :frowning: