When i use DataLoader, I encountered the problem described as title.
My code is as follows:
def __getitem__(self, index):
record = self.video_list[index]
if self.is_train:
data = self.feature_list[index]
else:
data = self.feature_list[record.id]
print('sample the video size is {}'.format(data.size()))
# Here is the error
video_feature = data.mean(dim=0, keepdim=True).view(-1)
return video_feature
The self.feature_list is the list of tensor saving my train data.
And there is the error information:
sample the video size is torch.Size([10, 2048])
train(model, train_loader, criterion, optimizer, ep)
File "train_c3d.py", line 110, in train
for i, (input, target) in enumerate(train_loader):
File "/media/data/kmy/envs/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 336, in __next__
return self._process_next_batch(batch)
File "/media/data/kmy/envs/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/media/data/kmy/envs/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/media/data/kmy/envs/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in <listcomp>
samples = collate_fn([dataset[i] for i in batch_indices])
File "/media/data/kmy/MBA/fea_dataset.py", line 54, in __getitem__
video_feature = data.mean(dim=0, keepdim=True).view(-1)
RuntimeError: CUDA error (3): initialization error
I’m glad it’s working.
However, I would like to make sure to understand the issue properly.
Usually you would lazily load the data onto the CPU in your Dataset using a DataLoader.
Then in the training loop you would push the data and target to the GPU.
Using this approach you will use minimal amount of your (limited) GPU memory to store the data.
Otherwise if you load all data onto the GPU you are wasting GPU memory, which might limit your model size etc.
This approach might still be alright, if your model is quite small, but I would generally not recommend it.
I noticed the problem. And I have modified my code.
Now my code is as blow,
model = nn.DataParallel(model).cuda()
...
for i, (input, target, num_frames) in enumerate(train_loader):
target = target.cuda(async=True)
...
# do some processing on the input
output = model(input.cuda(), seg_num_list)
Is that right?Actually I noticed that sending input without cuda() to the model can work well when using DataParallel. I guess DataPerallel move input to GPUs automatically Inside its interior ?
hi @ptrblck_de,
i am new to python and pytorch specifically…
i have the same error but it seems that cuda doesn’t work at all because also when running the commands above i get the same issue, do you have any idea how to solve this one?