The docs say you should pass the device parameter around and create your tensors with parameter device=device or use .to(device) to move them to the gpu; and to apply .cuda() to the model.
However typically I want to use CPU or GPU for everything. Therefore if I want GPU it seems easier to just at the start do:
This avoids the need to pass around a device parameter and loads of .to(device) calls where one can easily be forgotten by mistake. Is there anything wrong with this?
I have found a problem. It fails with multiprocessing. For example a dataloader with workers=0 works fine. If I set workers>0 then it fails with cuda initialization error. It fails even if the dataloader is already created. As soon as you create an iterator it fails. For example if you have a training loop that does “for x in dl” then it fails.
@contextmanager
def set_default_tensor_type(tensor_type):
if torch.tensor(0).is_cuda:
old_tensor_type = torch.cuda.FloatTensor
else:
old_tensor_type = torch.FloatTensor
torch.set_default_tensor_type(tensor_type)
yield
torch.set_default_tensor_type(old_tensor_type)
...
for data in data_loader:
with set_default_tensor_type(torch.cuda.FloatTensor):
run_model(data)