Problem with reproduce result even with same seed

jazin · June 25, 2023, 8:15am

Hi
I’m going to reproduce results using contrastive unpaired translation.
I used the same seed in the beginning of my train function using:

def set_seed(seed):
torch.manual_seed(seed)
random.seed(seed)
np.random.seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
os.environ[“PYTHONHASHSEED”] = str(seed)

The batch size of my image is 1 and the patch size is 10, meaning in every epoch I have 10 iterations.
Since I want to reproduce the same results for two separate runs, the first iteration is consistent, but in the second iteration, the model output after updating the weight is different. All results before the first update are the same, but after updating weight using Adam optimizer, the results are changed.
it should be noted the data loader is consistent for each iteration and the number of workers is set to zero.
Here is the optimizer code:

def optimize_parameters(self):
    

    # forward
    self.forward()

       
    # update D
    self.set_requires_grad(self.netD, True)
    self.optimizer_D.zero_grad()
    self.loss_D_Y = self.compute_D_loss()

    print('self.loss_D_Y :', self.loss_D_Y )
    self.loss_D_Y.backward()
    self.optimizer_D.step()


    # update G
    self.set_requires_grad(self.netD, False)

    self.optimizer_G.zero_grad()


    self.optimizer_H.zero_grad()

    self.loss_G = self.compute_G_loss()

    print('self.loss_G :', self.loss_G )

    self.loss_G.backward()

    self.optimizer_G.step()

    self.optimizer_H.step()

And here is the first run:

-------------------------- itr no.: 0
self.loss_D_Y : tensor(0.55703777, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.31012464, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 1
self.loss_D_Y : tensor(1.02021098, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.67312062, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 2
self.loss_D_Y : tensor(0.81242311, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.35670471, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 3
self.loss_D_Y : tensor(0.59492981, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.63630438, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 4
self.loss_D_Y : tensor(0.78508723, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.22501385, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 5
self.loss_D_Y : tensor(0.37921441, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.00249624, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 6
self.loss_D_Y : tensor(0.34842652, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.39195514, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 7
self.loss_D_Y : tensor(0.52215433, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.19204855, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 8
self.loss_D_Y : tensor(0.39162478, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.22784519, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 9
self.loss_D_Y : tensor(0.84793001, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.31176651, device=‘cuda:0’, grad_fn=)

and here in the second run:

-------------------------- itr no.: 0
self.loss_D_Y : tensor(0.55703777, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.31012487, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 1
self.loss_D_Y : tensor(1.02020955, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.67310870, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 2
self.loss_D_Y : tensor(0.81234312, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.35686684, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 3
self.loss_D_Y : tensor(0.59236562, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.62531424, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 4
self.loss_D_Y : tensor(0.78288686, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.22760558, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 5
self.loss_D_Y : tensor(0.36207247, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.04722524, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 6
self.loss_D_Y : tensor(0.39092541, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.51113224, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 7
self.loss_D_Y : tensor(0.57800019, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.30624080, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 8
self.loss_D_Y : tensor(0.33473897, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.16657138, device=‘cuda:0’, grad_fn=)
-------------------------- itr no.: 9
self.loss_D_Y : tensor(0.82777905, device=‘cuda:0’, grad_fn=)
self.loss_G : tensor(1.19245756, device=‘cuda:0’, grad_fn=)

Could you please help me understand to the source of this problem?

ptrblck · June 25, 2023, 8:54pm

Check the Reproducibility docs which explain how to use deterministic algorithms via torch.use_deterministic_algorithms(True).

jazin · June 26, 2023, 6:13pm

Thank you so much.
It works for me. I activate torch.use_deterministic_algorithms(True).
I used nn.ReflectionPad3d(1), which is not deterministic. I changed it into zero padding and it works know.