Hi,
I am having the same issue. I cannot get reproducible results training the FasterRCNN model in PyTorch. I have followed everything in REPRODUCIBILITY doc.
I set the seed at the beginning of my code as follows:
g = torch.Generator()
g.manual_seed(10)
def seed_worker(worker_id):
worker_seed = torch.initial_seed() % 2**32
np.random.seed(worker_seed)
random.seed(worker_seed)
def set_seed(seed):
os.environ['PYTHONHASHSEED'] = str(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.enabled = False
#torch.use_deterministic_algorithms(True)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)
set_seed(10)
And I have disabled the augmentation and set the number of workers as 0 in data loader.
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=2, shuffle=True, num_workers=0,
collate_fn=utils.collate_fn, worker_init_fn=seed_worker, generator=g)
Model is created as follows:
self.model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
num_classes = 2 # 1 class (nodule) + background
in_features = self.model.roi_heads.box_predictor.cls_score.in_features
self.model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
self.model.to(self.device)
I use cuda 10.2 with torch version 1.10.0, so I have also set the environmental variable CUBLAS_WORKSPACE_CONFIG as mentioned in the tutorial. Do you have an idea why the results are not reproducible - and any suggestions of what could I try?
Also, if i set torch.use_deterministic_algorithms(True) after setting the CUBLAS_WORKSPACE_CONFIG, i get the following error:
RuntimeError: linearIndex.numel()sliceSizenElemBefore == value.numel()INTERNAL ASSERT FAILED at “/pytorch/aten/src/ATen/native/cuda/Indexing.cu”:250, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor10231
Many thanks, Ecem