Cuda() & DataParallel

Hi everyone !

If I have a single GPU, are the following functions equivalent ?

my_model = model.cuda()
my_model = DataParallel(model, devices_ids = [torch.cuda.current_device()])
my_model = DataParallel(model, devices_ids = [torch.cuda.current_device()]).cuda()

Thank you!

If you are using a single GPU, these approaches should yield the same results.
However, wrapping the model in an nn.DataParallel will also use the .module attribute for the original model:

model = nn.Linear(1, 1)
print(model)
> Linear(in_features=1, out_features=1, bias=True)
model = nn.DataParallel(model, device_ids=[0])
print(model)
> DataParallel(
  (module): Linear(in_features=1, out_features=1, bias=True)
)

Do you see any unexpected behavior?

1 Like

Thanks for the response.

If I do my_model = model.cuda() I have to do model_input = model_input.cuda(), otherwise I get the following message:
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'weight'

But if I do:

my_model = DataParallel(model, devices_ids = [torch.cuda.current_device()])
or
my_model = DataParallel(model, devices_ids = [torch.cuda.current_device()]).cuda()

I don’t need to do model_input = model_input.cuda(), it works as well with model_input = model_input.cpu().

However for both model.cuda() and DataParallel(model, ...) I don’t have to specify the map_location in torch.load(path, map_location = …), it can either be 'cpu' or 'cuda'.

I use torch.load() to load the weights of a trained model from a .pt file.