Cant have reproducible results with GPU

Hi everyone,
I am using pytorch(1.10.2)(cudnn=8.1.1_11.2) both with Jupyter and VSCode. I have reproducible results with CPU but not with GPU (use only 1 gpu). I use 3D Resnet for my prediction model. I remove the last two layers of ResNet34 (pooling and FC layers) and add my own classifier (including ConvTranspose3d and AvgPool3d (non-deterministic algorithms)). I also do random cropping for data augmentation. I use the followings for the seeds:

def setup_seed(seedno):
os.environ[‘PYTHONHASHSEED’] = str(seedno)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

def _init_fn(worker_id):
worker_seed = torch.initial_seed() % 2**32

g = torch.Generator()

train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True, num_workers=40, prefetch_factor=256, drop_last=True, worker_init_fn=numpy.random.seed(int(42)))

val_loader = DataLoader(validation_dataset, batch_size=1, shuffle=False, num_workers=40, prefetch_factor=256, drop_last=True, worker_init_fn=numpy.random.seed(int(42)))

test_loader = DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=40, prefetch_factor=256,drop_last=True, worker_init_fn=numpy.random.seed(int(42)))

When I use torch.use_deterministic_algorithms(True), I got the error of
“RuntimeError: avg_pool3d_backward_cuda does not have a deterministic implementation…”.

The batches in train_loader are exactly the same for two runs but the test accuracies vary about ~%5 when using GPU (the results are exactly the same with CPU). When I check Reproducibility — PyTorch 2.0 documentation, it says some algorithms (e.g. AvgPool3d) are non-deterministic and cannot provide reproducible results. I would appreciate any suggestions on this issue.

You could create a feature request on GitHub for the missing deterministic algorithms of the 3d pooling layer or move it to the CPU as a current workaround.

@ptrblck thank you for your reply. I prefer to use GPU since it takes too long to train the model with CPU. Apparently, there is no way to get rid of this reproducibility issue while using some non-deterministic algorithms with GPU.