The same seed but different running results on two executions

I just written a simple model to classify cifar10 like below method:https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html#sphx-glr-beginner-transfer-learning-tutorial-py

And I ran it twice,with the same seed:

torch.manual_seed(60)
torch.cuda.manual_seed(60)

and I set dataset loader shuffle=False without any transformer that may include random variable.
Besides, my network parameters are loaded by an existed weight in order to avoid random assignment.

But in training period, with epoch increasing, the differences between the network weights of two executions were more and more obvious (After the first epoch, the differences are just about 1e-9, second 1e-7 and expended Persistently).
Why that happened? Is there any reason like computation error or there are still some random variables ignored? Thanks in advance.

If you are using cuDNN, you should set the deterministic behavior.
This might make your code quite slow, but might be a good method to check your code and deactivate it later.

torch.backends.cudnn.deterministic = True
3 Likes

Thanks for your reply, but it still can’t solve my problem…

is your weight initialization determined by this random seed? if it’s not that could explain the difference?

emmm I’ve described that my model loaded existed weight trained before so it was fixed without any random factors.

Are you using multiple workers in your DataLoader or any other random functions e.g. from numpy?

Yes I used multiple workers as below:

train_loader        = DataLoader(cifar10.Cifar10(mode='train',  dataset_size=DATASET_SIZE, binclassify=None), shuffle=False, batch_size=BATCH_SIZE, num_workers=BATCH_SIZE)
test_loader         = DataLoader(cifar10.Cifar10(mode='test', dataset_size=DATASET_SIZE, binclassify=None), shuffle=False, batch_size=BATCH_SIZE, num_workers=BATCH_SIZE)
validation_loader   = DataLoader(cifar10.Cifar10(mode='validation', dataset_size=DATASET_SIZE, binclassify=None), shuffle=False, batch_size=BATCH_SIZE, num_workers=BATCH_SIZE)

But I didn’t find any place any other random functions being used.

Could you just for the debugging purpose set num_workers=1 and see if the first few iterations differ in a similar way?

Clearly, these randomizers do not generate the same sequence; here’s an example:

torch.cuda.manual_seed(60)

torch.cuda.FloatTensor(1).normal_()
Out[94]: tensor([ 0.7700], device='cuda:0')

torch.cuda.FloatTensor(1).normal_()
Out[95]: tensor([ 0.5048], device='cuda:0')

torch.cuda.manual_seed(60)

torch.cuda.FloatTensor(1).normal_()
Out[97]: tensor([ 0.7700], device='cuda:0')

torch.manual_seed(60)
Out[98]: <torch._C.Generator at 0x7f537c0320b0>

torch.randn(1)
Out[99]: tensor([ 0.7534])

torch.randn(1)
Out[100]: tensor([ 1.8541])

torch.manual_seed(60)
Out[101]: <torch._C.Generator at 0x7f537c0320b0>

torch.randn(1)
Out[102]: tensor([ 0.7534])

Or, I am missing something here?

The seeds work for the CPU and GPU separately, but cannot generate the same random numbers for CPU and GPU.
torch.manual_seed(SEED) will also seed the GPU, but the PRNG used on the GPU and CPU are different. The code should yield deterministic results nevertheless running on the specified device. As far as I know, I is currently not possible to get the same random numbers on different devices. Probably it’s comparable to the seeding in PyTorch vs. numpy. Both will yield deterministic results, but not the same numbers.

@Deeply Yes. I too would prefer that the PRNG is consistent between CPU and GPU; as I allude to in Best practices for seeding random numbers on gpu?

It still cannot solve it.
Are there any wrong in me check process described below?
I ran my code twice on GPU, after optimizer.step() I used torch.save(model[1]._parameters['weight'].cpu().data, 'w1') to save the weight to my disk (the second time changes w1 to w2). Then I loaded the two weights using torch.load('w1') and torch.load('w2'), subtracting them and check if the results are all 0.

Looks good to me. To search for the problematic part, could you repeat this procedure with random tensors as input, i.e. don’t use your Dataset and DataLoader?
Since you are seeding, the random tensor should be the same in each run.

Hi! I’m very new with trying to learn pytorch. When trying to compare models i found that the outputs are different.
Simple CNN:

class NetOne(nn.Module):
def init(self):
super(NetOne, self).init()
self.c1 = 16
self.c2 = 8
self.c3 = 4
self.size = 32
self.fclen1=419
self.fclen2=10
self.conv1 = nn.Conv2d(3, self.c1, 3, padding=1)
self.conv2 = nn.Conv2d(self.c1, self.c2, 3, padding=1)
self.conv3 = nn.Conv2d(self.c2, self.c3, 3, padding=1)
self.bn1 = nn.BatchNorm2d(self.c1)
self.bn2 = nn.BatchNorm2d(self.c2)
self.bn3 = nn.BatchNorm2d(self.c3)
self.fc1 = nn.Linear(self.c3 * self.size * self.size, self.fclen2)
self.fc2 = nn.Linear(self.fclen1, self.fclen2)
self.dropout = nn.Dropout(0.25)

def forward(self, x):
    x = F.relu(self.bn1(self.conv1(x)))
    x = self.dropout(x)
    x = F.relu(self.bn2(self.conv2(x)))
    x = F.relu(self.bn3(self.conv3(x)))
    x = x.view(-1, self.c3 * self.size * self.size)
    x = self.fc1(x)
    return x

And using it:

   model1 = NetOne()
   model2 = NetOne()


for _, (data, target) in enumerate(train_loader):
if train_on_gpu:
data1, target1 = data.cuda(), target.cuda()
data2, target2 = data.cuda(), target.cuda()
else :
data1, target1 = data, target
data2, target2 = data, target
optimizer1.zero_grad()
optimizer2.zero_grad()
output1 = model1(data1)
output2 = model2(data2)
for i in range (len(s1)):
if s1[i] != s2[i]:
print(‘loss error %d : %f %f’ % (iterNum, s1[i], s2[i]))

gives a lot of lines like:

loss error 0 : -0.132742 0.350880
loss error 1 : -26.758894 -30.239246
loss error 2 : -10.001531 -12.009613

Can you suggest what’s wrong, please.

Based on your posted code you are initializing two models randomly, so the results are expected to be different.
To get the same results, you should either set the seed before creating an instance of the model or load the state_dict from one model into the other (which I would recommend).
Also, since the models are using dropout layers, you would have to call model.eval() on them to disable it.