Hi, I used to have a single gpu, but since now I have two,
I tried to run my code in cuda:1, rather than cuda:0 which I normally use.
However, I ran into the error of
File "/Hard_3rd/harry/TOF_hj_0306/train/model_trainers/trainer_CU_MixRes_scale.py", line 297, in _train_epoch
for step, data in data_loader:
File "/home/user/anaconda3/envs/TOF/lib/python3.7/site-packages/tqdm/std.py", line 1107, in __iter__
for obj in iterable:
File "/home/user/anaconda3/envs/TOF/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in __next__
return self._process_next_batch(batch)
File "/home/user/anaconda3/envs/TOF/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/home/user/anaconda3/envs/TOF/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/user/anaconda3/envs/TOF/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 68, in default_collate
return [default_collate(samples) for samples in transposed]
File "/home/user/anaconda3/envs/TOF/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 68, in <listcomp>
return [default_collate(samples) for samples in transposed]
File "/home/user/anaconda3/envs/TOF/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 42, in default_collate
out = batch[0].new(storage)
RuntimeError: Attempted to set the storage of a tensor on device "cuda:1" to a storage on different device "cuda:0". This is no longer allowed; the devices must match.
I guess the issues come from default collate_fn
trying to send data to cuda:0, when it is already on cuda:1. How can I stop this from happening? Is there a way I can still use default collate_fn
while running my code properly?