Here’s a polished and professional version of your forum post describing the reproducibility issue between RTX 2080 Ti and RTX 4090 Ti:
PyTorch Model Reproducibility Issue: RTX 2080 Ti vs. RTX 4090 Ti
Background:
I’m training the same PyTorch model on two different GPUs but observing significant differences in performance, particularly on the RTX 4090 Ti, which appears to overfit and fails to match the results achieved on the RTX 2080 Ti.
Environment & Hardware Details
1. RTX 2080 Ti (Original Training Setup)
{
'CUDA available': True,
'CUDA_HOME': None,
'GCC': 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0',
'GPU 0': 'NVIDIA GeForce RTX 2080 Ti',
'LibMTL': 'modify 1.1.6',
'Numpy': '1.21.5',
'Platform': 'linux',
'PyTorch': '1.8.0',
'Python': '3.7.0 (default, Oct 9 2018, 10:31:47) [GCC 7.3.0]',
'TorchVision': '0.9.0'
}
Epoch 5 Results (2080 Ti):
[2022-11-29 12:02:56,707][INFO][recorder.py][85] - Epoch: 5 ----- Mode: train
{'buchwald': {'loss': 0.53, 'metrics': {'R2': 0.715}},
'time': {'backward_time': 12.35, 'data_time': 1.145, 'forward_time': 8.286}}
----------------------------------------------------------------------------------------------------
[2022-11-29 12:02:59,903][INFO][recorder.py][85] - Epoch: 5 ----- Mode: val
{'buchwald': {'loss': 0.537, 'metrics': {'R2': 0.698}},
'time': {'data_time': 0.014, 'forward_time': 2.049}}
----------------------------------------------------------------------------------------------------
[2022-11-29 12:03:05,232][INFO][recorder.py][85] - Epoch: 5 ----- Mode: test
{'buchwald': {'loss': 0.502, 'metrics': {'R2': 0.749}},
'time': {'data_time': 0.025, 'forward_time': 4.171}}
2. RTX 4090 Ti (New Training Setup)
{
'CUDA available': True,
'CUDA_HOME': '/usr/local/cuda',
'GCC': 'gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0',
'GPU 0,1,2,3,4,5': 'NVIDIA GeForce RTX 4090',
'LibMTL': 'modify 1.1.6',
'NVCC': 'Cuda compilation tools, release 11.8, V11.8.89',
'Numpy': '2.0.2',
'Platform': 'linux',
'PyTorch': '2.7.1+cu128',
'Python': '3.9.23 (main, Jun 5 2025, 13:40:20) [GCC 11.2.0]',
'TorchVision': '0.22.1+cu128'
}
Epoch 5 Results (4090 Ti):
[2025-07-17 18:02:06,172][INFO][recorder.py][85] - Epoch: 5 ----- Mode: train
{'buchwald': {'loss': np.float64(0.284), 'metrics': {'R2': 0.924}},
'time': {'backward_time': 5.906, 'data_time': 0.033, 'forward_time': 10.398}}
----------------------------------------------------------------------------------------------------
[2025-07-17 18:02:08,721][INFO][recorder.py][85] - Epoch: 5 ----- Mode: val
{'buchwald': {'loss': np.float64(0.631), 'metrics': {'R2': 0.593}},
'time': {'data_time': 0.001, 'forward_time': 2.539}}
----------------------------------------------------------------------------------------------------
[2025-07-17 18:02:14,147][INFO][recorder.py][85] - Epoch: 5 ----- Mode: test
{'buchwald': {'loss': np.float64(0.549), 'metrics': {'R2': 0.71}},
'time': {'data_time': 0.0, 'forward_time': 5.41}}
Key Observations:
- Overfitting on 4090 Ti:
- The training loss (
0.284
) is much lower than on the 2080 Ti (0.53
), but the validation (0.631
vs.0.537
) and test (0.549
vs.0.502
) losses are worse. - The
R2
metric drops significantly on validation (0.593
vs.0.698
) and test (0.71
vs.0.749
).
- Failed to Reproduce Original Performance:
- Even after hyperparameter tuning, the model on the 4090 Ti cannot reach the previous best
R2
of0.95
achieved on the 2080 Ti.
Possible Causes & Questions:
- Numerical Differences: Are there any known non-determinism issues between PyTorch 1.8.0 (2080 Ti) and 2.7.1 (4090 Ti)?
- Hardware Differences: Does the 4090 Ti’s architecture (Ada Lovelace vs. Turing) affect training dynamics?
Any insights or suggestions would be greatly appreciated!"
Would you like me to suggest potential solutions (e.g., deterministic training flags, mixed precision adjustments)?