The order of using "model.to(device)" and "model = nn.DataParallel(model)"

Assume that we have multiple GPUs, It is better to use

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = net().to(device)
if torch.cuda.device_count() > 1:
    model = nn.DataParallel(model)

or

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = net()
if torch.cuda.device_count() > 1:
    model = nn.DataParallel(model)
model.to(device)

And what is the difference between these two ways?

Thank you for your answer!