Hi everyone,
I am using pytorch(1.10.2)(cudnn=8.1.1_11.2) both with Jupyter and VSCode. I have reproducible results with CPU but not with GPU (use only 1 gpu). I use 3D Resnet for my prediction model. I remove the last two layers of ResNet34 (pooling and FC layers) and add my own classifier (including ConvTranspose3d and AvgPool3d (non-deterministic algorithms)). I also do random cropping for data augmentation. I use the followings for the seeds:
def setup_seed(seedno):
torch.manual_seed(seedno)
os.environ[‘PYTHONHASHSEED’] = str(seedno)
torch.cuda.manual_seed(seedno)
torch.cuda.manual_seed_all(seedno)
np.random.seed(seedno)
random.seed(seedno)
torch.manual_seed(seedno)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
def _init_fn(worker_id):
worker_seed = torch.initial_seed() % 2**32
numpy.random.seed(worker_seed)
random.seed(worker_seed)
setup_seed(int(42))
g = torch.Generator()
g.manual_seed(42)
train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True, num_workers=40, prefetch_factor=256, drop_last=True, worker_init_fn=numpy.random.seed(int(42)))
val_loader = DataLoader(validation_dataset, batch_size=1, shuffle=False, num_workers=40, prefetch_factor=256, drop_last=True, worker_init_fn=numpy.random.seed(int(42)))
test_loader = DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=40, prefetch_factor=256,drop_last=True, worker_init_fn=numpy.random.seed(int(42)))
When I use torch.use_deterministic_algorithms(True), I got the error of
“RuntimeError: avg_pool3d_backward_cuda does not have a deterministic implementation…”.
The batches in train_loader are exactly the same for two runs but the test accuracies vary about ~%5 when using GPU (the results are exactly the same with CPU). When I check Reproducibility — PyTorch 2.0 documentation, it says some algorithms (e.g. AvgPool3d) are non-deterministic and cannot provide reproducible results. I would appreciate any suggestions on this issue.