Hi,
When I check my GPU usage, I found that GPU idle is often occurred, which makes the simulation slow down.
Especially, GPU idle comes from the data transfer from CPU to GPU:
- videos = videos.cuda()
- questions = questions.cuda()
- answers = answers.cuda()
The main forward & backward computation took around 0.55 seconds, but the data transfer took 0.25 seconds…
I tried to solve this problem by using cuda() operation inside the customized collate function:
- def SeqCollate(batch):
- videos, questions, answers = zip(*batch)
videos = torch.cat(videos, 0)
questions = torch.cat(questions, 0)
answers = torch.cat(answers, 0)
return (videos.cuda(), questions.cuda(), answers.cuda())
- videos, questions, answers = zip(*batch)
But, I got the following errors:
“RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the ‘spawn’ start method”
Could you kindly let me know how can I solve the data transfer bottleneck from CPU to GPU?
Thanks for reading my question!