Hi - I have been using PyTorch for a long time. I use Torch’s dataLoader() functionality to mini batch my dataset. After that I loop through each mini batch of training and test sample to put them in GPU and call my model on each mini batch to run forward and backward pass as shown below:
for i, (trInput, trOutput) in enumerate(dataLoader):
# Move tensors to the configured device
trInput = trInput.to(self.device)
trOutput = trOutput.to(self.device)
I was under impression that dataLoader is taking the whole dataset in the GPU and then split the data into mini batches, so that there won’t be any transfer of data between CPU and GPU. But it looks like this is not the case. When I run my GPU code I can see 100% CPU utilization which indicates lot of I/O operations.
Having said that, how should I change my code to:
- first take the whole training set in GPU memory
- Minibatch data in GPU
- Train model
I’m looking to avoid any data transfer from CPU to GPU during the training process. Should I use different type of data loader? Any help will be appreciated.
Thank you,
Tomojit