Training loss is different when using different pytorch version

mf-giwoong-lee · April 7, 2022, 8:29am

I want to upgrade pytorch version from 1.7.1 to 1.11.0.

Before do that, I want to generate same result from 1.7.1 and 1.11.0

I tested my network based on gpt2 with various pytorch version but the results are like above.

I set same random seed value (=0). But the loss graphs are slightly different.

I used same code to test and use cuda version for pytorch as 11.x (1.7.1-cuda11.0, 1.8.0&1.9.0-cuda11.1, 1.11.0-cuda11.3)

Base docker image for testing is cuda11.3-ubuntu 18.04.

I tried to find some differences between 1.7.1 and 1.11.0 but I could not find any significant differences between them…

Is there any possible solution or any advise to solve this problem?

Samuel_Bachorik · April 7, 2022, 11:38am

Hi, I am not sure what your network architecture is but in my opinion, maybe if you are not using any weight initialization for your layers then from version to version there may be other basic weights in layers. Also some other functions can be improved.

mf-giwoong-lee · April 7, 2022, 11:51pm

I used xavier_uniform, xavier_normal in custom attention layer and reset_parameter function in pre-defined pytorch layers e.g: nn.Linear, nn.LayerNorm, nn.Dropout.

Your answer is related to reset_parameter function in torch layers.

Thanks to reply my question. I try it!

Also if some guys have a solution or advise, then please reply your answer!

mxahan · April 8, 2022, 3:59am

I was just wondering if you have tried the seed control as discussed here.

mf-giwoong-lee · April 8, 2022, 4:59am

I just used torch.manual_seed, random.seed, np.random.seed, torch.backends.cudnn.deterministic = True, torch.backends.cudnn.benchmark = False.

Your link describes additional methods for reproducibility.

Thanks! I will try it!

mf-giwoong-lee · April 11, 2022, 3:56am

I have not gotten a meaningful solution…

I guess that cuda versions of each pytorch version are different so the models are trained in slightly different manner.

I want more advise or possible solutions nowadays…

InnovArul · April 11, 2022, 4:03am

Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds.

There are numerous factors affecting reproducibility.
What about the quantitative results in terms of accuracy?
You haven’t talked about it. Can you pl. share if they are more or less equal?

mf-giwoong-lee · April 11, 2022, 4:26am

Your saying is that if I train models in same machine environment (like same CPU and GPU), then completely reproducible results are not guaranteed across PyTorch releases?

This is validation result (validation RMSE)

mf-giwoong-lee · April 11, 2022, 4:27am

This is validation loss.

InnovArul · April 11, 2022, 4:39am

There are two points:

Reproducibility between different PyTorch releases, individual commits, or different platforms is not guaranteed regardless of the same CPU/GPU usage.
If you run the same code in CPU mode vs. GPU mode, still the reproducibility is not guaranteed.

You can verify if the quantitative results are close enough (RMSE, Accuracy, mAP, F1 score etc.,) in all the versions.