Hi everyone !
If I have a single GPU, are the following functions equivalent ?
my_model = model.cuda()
my_model = DataParallel(model, devices_ids = [torch.cuda.current_device()])
my_model = DataParallel(model, devices_ids = [torch.cuda.current_device()]).cuda()
Thank you!
If you are using a single GPU, these approaches should yield the same results.
However, wrapping the model in an nn.DataParallel
will also use the .module
attribute for the original model:
model = nn.Linear(1, 1)
print(model)
> Linear(in_features=1, out_features=1, bias=True)
model = nn.DataParallel(model, device_ids=[0])
print(model)
> DataParallel(
(module): Linear(in_features=1, out_features=1, bias=True)
)
Do you see any unexpected behavior?
1 Like
Thanks for the response.
If I do my_model = model.cuda()
I have to do model_input = model_input.cuda()
, otherwise I get the following message:
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'weight'
But if I do:
my_model = DataParallel(model, devices_ids = [torch.cuda.current_device()])
or
my_model = DataParallel(model, devices_ids = [torch.cuda.current_device()]).cuda()
I don’t need to do model_input = model_input.cuda()
, it works as well with model_input = model_input.cpu()
.
However for both model.cuda()
and DataParallel(model, ...)
I don’t have to specify the map_location
in torch.load(path, map_location = …), it can either be 'cpu'
or 'cuda'
.
I use torch.load() to load the weights of a trained model from a .pt file.