Different results for the same training model trained on different devices

We are training a network on medical images for a segmentation task. We ran the same training on different devices and got different results. One device gave us 83%, and the other gave us 91.5%, a difference of 8.5%. Details about the devices’ specs and environments are below:

First device:
System: Intel Core i9-7920X CPU, NVIDIA Titan V GPU, Linux-x86_64, Ubuntu 18.0.4.6 LTS, NVIDIA Driver Version 525.105.17.
Libs: spyder 5.5.1, python 3.12.7, pytorch 2.5.1, pytorch-cuda 12.1, torchvision 0.20.1, monai 1.4.0, numpy 1.26.4, wandb 0.19.4, einops 0.8.0, matplotlib 3.9.2, albumentations 0.0.10, scipy 1.13.1, scikit-learn 1.5.1, timm 1.0.14

Second device:
System: i7-14700K CPU, RTX 4070 TI super GPU, Windows 11 pro, Version: 10.0.22631.

Libs: albumentations 2.0.5, einops 0.8.1, logging 0.4.9.6, matplotlib 3.10.1, monai 1.4.0, numpy 1.26.4, python 3.11.7, scikit-learn 1.6.1, scipy 1.15.2, timm 1.0.15, torch 2.6.0+cu118, torchvision 0.21.0, wandb 0.19.10