But why is it even necessary? If a model is on cuda and you call model.cuda() it should be a no-op and if the model is on cpu and you call model.cpu() it should also be a no-op.
It’s necessary if you want to make the code compatible to machines that don’t support cuda. E.g. if you do a model.cuda()
or a sometensor.cuda()
, you will get a RuntimeError
.
Personally, I develop and debug 99% of the code on macOS, and then sync it over to a headless cluster, which is why this pattern is useful to me, for example.
if there’s a new attribute similar to model.device as is the case for the new tensors in 0.4.
Yes, e.g., you can now specify the device 1 time at the top of your script, e.g.,
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
and then for the model, you can use
model = model.to(device)
The same applies also to tensors, e.g,.
for features, targets in data_loader:
features = features.to(device)
targets = targets.to(device)