I’m trying to use DataLoader, which performs some operations (neural rendering) on GPU and outputs the result. I’m not able to get the DataLoader to load because it throws “RuntimeError: Caught RuntimeError in DataLoader worker process 0.” I have changed the num_workers to 0; this was not successful and stopped my code from having any output (i.e. no print statements; no error messages). Here is the code I’m referencing: DualAttentionAttack/data_loader.py at main · nlsde-safety-team/DualAttentionAttack · GitHub
Hi,
Please try to post a minimum executable snippet that reproduces your error enclosing it within ```.
Such errors are sometimes also caused while sampling. Make sure the indices generated by your Sampler
are aligned with your dataset’s indices.
Here is a minimal version of the dataloader setup:
from torch.utils.data import Dataset, DataLoader
from torchvision.io import read_image
class MyDataset(Dataset):
def __init__(self):
self.image_paths = glob.glob('./dataset/**.png')
# a torch.nn.module-like model
self.renderer = NeuralRenderer(img_size=(225,225)).cuda()
def __getitem__(self, index):
img = read_image(self.image_paths[index]).cuda()
return self.renderer(img)
dataset = MyDataset()
loader = DataLoader(dataset=dataset, batch_size=1, shuffle=False, num_workers=0)
Upon iterating over the loader is when I see the runtime error.
Please ensure image_paths
isn’t empty and that you are able to retrieve files by indexing it like -
image_paths[0]
.
Please post here if you still face the error.
I’ve ensured the path isn’t empty.
I’m able to access the images at any given index.
Please implement the __len__()
function in your MyDataset
class.
(This function just returns the length of your dataset and is expected by many implementations of the Sampler
class and the DataLoader
.)
If it still doesn’t work, please show the full error message that you are getting.
Maybe this could help:
class MyDataset(Dataset):
def __init__(self):
# body
def __getitem__(self, index):
# body
def __len__(self):
return len(image_paths)
Thanks for the reply. I only provided a very stripped-down version of a rather large implementation, and the full version does have len(). It seems like it doesn’t like that some pre-processing is happening on GPU before the dataloader can collate the dataset into batches?
Traceback (most recent call last):
File "train.py", line 373, in <module>
run_cam(test_dir, EPOCH)
File "train.py", line 279, in run_cam
for i,j in enumerate(loader):
File "/home/nsambhu/.conda/envs/dualattentionattack/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
data = self._next_data()
File "/home/nsambhu/.conda/envs/dualattentionattack/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/home/nsambhu/.conda/envs/dualattentionattack/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/home/nsambhu/.conda/envs/dualattentionattack/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/nsambhu/.conda/envs/dualattentionattack/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/nsambhu/.conda/envs/dualattentionattack/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/nsambhu/.conda/envs/dualattentionattack/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/nsambhu/github/DualAttentionAttack/src/data_loader.py", line 87, in __getitem__
imgs_pred = self.mask_renderer.forward(self.vertices_var, self.faces_var, self.textures)
File "/home/nsambhu/github/DualAttentionAttack/src/nmr_test.py", line 249, in forward
return self.RenderFunc(vertices, faces, textures)
File "/home/nsambhu/github/DualAttentionAttack/src/nmr_test.py", line 167, in forward
vs = vertices.cuda().numpy() # 11/27/2022 1:28:46 PM: Neil added
File "/home/nsambhu/.conda/envs/dualattentionattack/lib/python3.7/site-packages/torch/cuda/__init__.py", line 195, in _lazy_init
"Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
In that case, have you tried removing the .cuda()
calls from the init and getitem methods? Does it work then?
Unfortunately, it still fails:
Traceback (most recent call last):
File "train.py", line 373, in <module>
run_cam(test_dir, EPOCH)
File "train.py", line 279, in run_cam
for i,j in enumerate(loader):
File "/home/nsambhu/.conda/envs/dualattentionattack/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
data = self._next_data()
File "/home/nsambhu/.conda/envs/dualattentionattack/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/home/nsambhu/.conda/envs/dualattentionattack/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/home/nsambhu/.conda/envs/dualattentionattack/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/nsambhu/.conda/envs/dualattentionattack/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/nsambhu/.conda/envs/dualattentionattack/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/nsambhu/.conda/envs/dualattentionattack/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/nsambhu/github/DualAttentionAttack/src/data_loader.py", line 97, in __getitem__
imgs_pred = self.mask_renderer.forward(self.vertices_var, self.faces_var, self.textures)
File "/home/nsambhu/github/DualAttentionAttack/src/nmr_test.py", line 249, in forward
return self.RenderFunc(vertices, faces, textures)
File "/home/nsambhu/github/DualAttentionAttack/src/nmr_test.py", line 167, in forward
vs = vertices.cuda().numpy() # 11/27/2022 1:28:46 PM: Neil added
File "/home/nsambhu/.conda/envs/dualattentionattack/lib/python3.7/site-packages/torch/cuda/__init__.py", line 195, in _lazy_init
"Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Remove all CUDA calls from your Dataset.__getitem__
method as @srishti-git1110 already mentioned. The current code fails in trying to re-initialize the CUDA context in a new process since you are trying to move a tensor to the GPU in:
vs = vertices.cuda().numpy()
and directly afterwards to the CPU, which also sounds not necessary.
If you want to use a numpy array, just get rid of the .cuda()
call to avoid moving the tensor back and forth.