Why does results with different accuracies are generate when state dicts are loaded into the model differently

Given that I have a model and multiple sets of weights, say weights from epoch 1 to epoch 5 (w_1, w_2, w_3, w_4, w_5). I realise that different results are generated (during inference) when I load my weights into my model in two different ways:

My model is a CNN network with Batch Norms and ReLUs included.

Method 1:

model = myModel()
model = nn.DataParallel(model)
model = model.cuda()
for idx in range(1, 6):
     state_dict = torch.load(w_idx)
     model.load_state_dict(state_dict)
     model.eval()
     outputs = model(inputs)  #inference

Method 2:

for idx in range(1, 6):
     state_dict = torch.load(w_idx)
     ###############################
     model = myModel() 
     model = nn.DataParallel(model)
     model = model.cuda()
     ###############################
     model.load_state_dict(state_dict)
     model.eval()
     outputs = model(inputs)  #inference

May I ask why is that the case? Thank you

How large are these differences and are you also seeing them using the same approach?
If so, you might be facing non-deterministic results, if you use e.g. cudnn benchmarking etc. due to the limited floating point precision.