Hi, as far as I know the correct way to build a model is:

```
model = Model() #build the model
model = nn.DataParallel(model)
model.to(device) #move the model to device
optimizer = optim.Adam(model.parameters()) #build the optimizer
```

Now assume I want to load the parameters of the model and optimizer states from a pre-trained model (continue learning procedure) for a multi-GPU case. Then I am not sure where to load the optimizer:

```
model = Model() #build the model on cpu
checkpoint = torch.load(pretrainedModel) # load the pre-trained model
model.load_state_dict(checkpoint['model'])
model = nn.DataParallel(model)
model.to(device) #move the model to device
optimizer = optim.Adam(model.parameters()) #build the optimizer
```