Weird Reproducibility Issue

I have a pretty big network which can be present as Resnet+Test network. Previously I was only training the test network and always got reproducible results by fixing seeds and other things. By reproducible I mean that at every epoch of every run the result was always the same as the same epoch of the previous run.

Now I was trying to fine-tune the whole network. It means that the resnet is in the training loop. My results are not exactly reproducible anymore. Each epoch differs by a small margin. For example, among 3 different runs the Average precision(AP) after 1st epoch were: 25.101604, 25.100491, 25.095660.

I matched initial models between these runs before starting any training, they are exactly the same, also I made sure in every run they are getting the same samples.

I know the differences are not that much but I am really interested to know the reason. Any idea from anyone?

Did you change anything between the runs or are you running in exactly the same setup?
Are you also using the same script and use the information from the reproducibility docs?

I am using the exact same script, didn’t change anything, I read the reproducibility doc and follow them like this:

seed=10
torch.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(seed)
random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
def _init_fn(worker_id):
    np.random.seed(int(seed))

Also, a few interesting things to notice, I compare models between two runs; before and after the first 20 iterations. Each iteration has a batch size of 8. Before any iteration the models between two runs are exactly the same, after 20 iterations the models only differ on my testnet, the resnet part remains the same. Also, I compare loss for each iteration and found out that the first difference between losses occurs in the 7th iteration which means before that there are 48 images that have passed without any issue. The losses are:
0.703749358654(1st run) 0.703749418259(2nd run)

I think this is maybe coming from some precision issue of small numbers.

The difference is most likely coming from the limited floating point precision.
However, if I understand the issue correctly, you were able to get exactly the same results before trying to fine tune the model? If that’s correct, what exactly did you change for the fine tuning run?

My pretrained layer’s name was ‘Conv_pretrain’ so I just change the name in the code so that the first condition never got excuted.

for name, p in res.named_parameters():
    if name.split('.')[0]=='Conv_pretrainn':
      
        p.requires_grad=False
        not_trainables.append(p)
  
    else:
        if name.split('.')[0]=='conv_sp_map' or name.split('.')[0]=='spmap_up':
            spmap.append(p)
        else:
            trainables.append(p)
    
optim1 = optim.SGD([{"params":trainables,"lr":learning_rate},
                    {"params":spmap,"lr":0.001}],
                    ],
                    momentum=0.9,weight_decay=0.0001)
lambda1 = lambda epoch: 1.0 if epoch < 10 else (0.1 if epoch < 31 else 0.01)  
lambda2 = lambda epoch: 1
scheduler=optim.lr_scheduler.LambdaLR(optim1,[lambda1,lambda2])

I can’t see any introduces randomness to the code.
You have a typo in pretrainn, but I assume that’s a copy-paste issue?

Could you post an executable code snippet to reproduce this issue, please?

No, previously I was not including the part of resnet in my training parameters. Now as I wrote pretrainn, the first condition is never getting executed.

Unfortunately, I couldn’t create a small executable version of the issue. I tried the following code. But I cant find the issue here, maybe, it is related to my dataset. Thanks a lot for all the help.

from __future__ import print_function, division
import torch
import torch.nn as nn
import torch.optim as optim 
import os
import numpy as np
import random
import torchvision.models as models
seed=10
torch.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
random.seed(seed)
model =models.resnet152(pretrained=True)
sigmoid=nn.Sigmoid()
loss = nn.BCEWithLogitsLoss(reduction='mean')
class Flatten(nn.Module):
   def __init__(self):
        super(Flatten,self).__init__()
 
   def forward(self, x):
        return x.view(x.size()[0], -1)
class testnet(nn.Module):
    def __init__(self):
 	        super(testnet,self).__init__()
 	            
		self.flat= Flatten()
                

                self.Conv_pretrain = nn.Sequential(*list(model.children())[0:7])## Resnets,resnext

	        self.linear1=nn.Linear(200704,1000)	
	        self.linear2=nn.Linear(1000,1)	
                
                
                
                
	
    def forward(self,x):
        out1 = self.linear2(self.linear1(self.flat(self.Conv_pretrain(x))))###

        #import pdb;pdb.set_trace()
        return out1
if __name__=='__main__':
    all_=400
    inputs=torch.rand(all_,3,224,224).float()
    labels=torch.LongTensor(all_).random_(0, 2).float().cuda()
    model=testnet().cuda().float()
    frozen_points=['Conv_pretrain.0','Conv_pretrain.1']#,'Conv_pretrain.2']#,'Conv_pretrain.3','Conv_pretrain.4','Conv_pretrain.5']#'Conv_pretrain.6.0','Conv_pretrain.6.1','Conv_pretrain.6.2','Conv_pretr    ain.6.3','Conv_pretrain.6.4','Conv_pretrain.6.5','Conv_pretrain.6.6','Conv_pretrain.6.7','Conv_pretrain.6.8']

    trainables1=[]
    trainables2=[]
    not_trainables=[]
    pretrain_tune=[]
    for name, p in model.named_parameters():
	    if name.split('.')[0]=='Conv_pretrain':
	        if name[0:15] in frozen_points or (name[0:17] in  frozen_points and name[17]=='.'):
                    p.requires_grad=False
                    not_trainables.append(p)
                else:
                    pretrain_tune.append(p)
	    else:
                if name[0:7]=='linear1':
                    trainables1.append(p)
                elif name[0:7]=='linear2':
                    trainables2.append(p)
	    
    import pdb;pdb.set_trace()
    optimizer = optim.SGD([{"params":trainables1,"lr":0.001},
                        {"params":trainables2,"lr":1e-4},
			{"params":pretrain_tune,"lr":1e-5}],
			momentum=0.9,weight_decay=0.0001)
    lambda1 = lambda epoch: 1.0 if epoch < 10 else (10 if epoch < 31 else 1) 
    lambda2 = lambda epoch: 1
    lambda3 = lambda epoch: 1
    scheduler=optim.lr_scheduler.LambdaLR(optimizer,[lambda1,lambda2,lambda3])
    batch=8
    l=0
    mean=0
    while l<all_:
        optimizer.zero_grad()   
        curr_input=inputs[l:l+batch].cuda()
        out=model(curr_input).squeeze()
        #print(sigmoid(out))
        print(sigmoid(out).cpu().detach().sum())
        #import pdb;pdb.set_trace()
        lossf=loss(out,labels[l:l+batch])
        lossf.backward()
        optimizer.step()
        l+=batch