Given that I have a model and multiple sets of weights, say weights from epoch 1 to epoch 5 (w_1, w_2, w_3, w_4, w_5). I realise that different results are generated (during inference) when I load my weights into my model in two different ways:

My model is a CNN network with Batch Norms and ReLUs included.

Method 1:

```
model = myModel()
model = nn.DataParallel(model)
model = model.cuda()
for idx in range(1, 6):
state_dict = torch.load(w_idx)
model.load_state_dict(state_dict)
model.eval()
outputs = model(inputs) #inference
```

Method 2:

```
for idx in range(1, 6):
state_dict = torch.load(w_idx)
###############################
model = myModel()
model = nn.DataParallel(model)
model = model.cuda()
###############################
model.load_state_dict(state_dict)
model.eval()
outputs = model(inputs) #inference
```

May I ask why is that the case? Thank you