Drastically different inference results on different machines?

CCL · February 28, 2020, 8:09am

I trained a segmentation model in Pytorch and tested it to give an F1 score of 0.93 on my local computer (Windows, conda, CUDA 10.2, Pytorch 1.2). However, the F1 score dropped to 0.3 when testing on a Linux server (conda, CUDA 9.0, Pytorch 1.1). I double checked that both sets of code, label files, test sets were the same, and there is no “explicit” random sampling in my code (even if so, the effect shouldn’t be so drastic).

I tried the following to solve my problem:
I suspected that it might have something to do with Pytorch versions, so installed Pytorch 1.1 on my Windows machine to match the server’s, but got the same 0.93 score. I then thought it had to do with CUDA itself, so eliminated the GPU factor altogether and did inference on cpu on server, I still got 0.3 on server.

What could possibly be causing this huge discrepancy?

simaiden · February 28, 2020, 2:18pm

Are you are setting the model to evaluation using model.eval() ?

CCL · February 28, 2020, 2:35pm

Yes, and both codes are the same

ptrblck · February 29, 2020, 6:19am

Could you check the output of a single input and post the max absolute difference between the predictions?
If this difference is already high, the next step would be to bisect the model and check each layer’s output.

CCL · February 29, 2020, 8:22am

Do you mean the pixelwise max absolute difference of a predicted mask? I’m doing binary segmentation, and treat predictions as positive if len(mask_pred[mask_pred > 0.5]) > 0.

ptrblck · February 29, 2020, 10:16pm

Yes, I meant the output of the model.
Your code would yield a positive prediction, if at least one pixel belongs to class1, is that correct?

CCL · March 1, 2020, 2:50am

I checked both the maximum (across all samples) difference before and after sigmoid on the 2 machines and they were the same!

Before: 37.93775939941406
After: 0.9999369382858276

The maximums & ranges for individual samples do differ though!

ptrblck · March 1, 2020, 3:17am

Do the provided values represent the difference between both models?
I’m not sure I understand the last post correctly.
What was same and was differs now?

CCL · March 1, 2020, 5:39am

Oh I meant that 37.9 is the max difference on both machines, before sigmoid. 0.99 is the max difference on both machines after sigmoid. Although the overall max difference is the same, I noticed that the difference of individual samples are different across machines.

ptrblck · March 1, 2020, 5:56am

Ah OK, thanks for the clarification.
In that case, I would recommend to check each layer’s output and narrow down, where this difference is created.

CCL · March 1, 2020, 6:04am

Thanks I’ll do that and post an update! Do you think it’s a Pytorch version issue? Maybe the code for some operations changed moving from 1.1.0 to 1.2.0?

ptrblck · March 1, 2020, 6:18am

You mentioned you’ve installed the same PyTorch versions and use the CPU now?
Make sure to use the same versions before staring to debug.

Also, I would recommend to use the lastest stable version (1.4) as well as Python3, as we’ve had a weird issue with Python2.7 recently (link).

CCL · March 1, 2020, 7:10am

Yes I forgot about that, I did test out different versions. It’s just that the Linux server uses CUDA 9.0 and 1.1.0 is the highest supported version if I’m not mistaken. I’m using python 3.7 for both (3.7.3 linux, 3.7.6 windows).

ptrblck · March 1, 2020, 7:16am

If you are using the binaries, they will ship with their own CUDA and cudnn libs.
You would only have to provide an NVIDIA driver.