CPU usage in GPU based PyTorch code

Hi - I have been using PyTorch for a long time. I use Torch’s dataLoader() functionality to mini batch my dataset. After that I loop through each mini batch of training and test sample to put them in GPU and call my model on each mini batch to run forward and backward pass as shown below:
for i, (trInput, trOutput) in enumerate(dataLoader):
# Move tensors to the configured device
trInput = trInput.to(self.device)
trOutput = trOutput.to(self.device)

I was under impression that dataLoader is taking the whole dataset in the GPU and then split the data into mini batches, so that there won’t be any transfer of data between CPU and GPU. But it looks like this is not the case. When I run my GPU code I can see 100% CPU utilization which indicates lot of I/O operations.

Having said that, how should I change my code to:

  1. first take the whole training set in GPU memory
  2. Minibatch data in GPU
  3. Train model

I’m looking to avoid any data transfer from CPU to GPU during the training process. Should I use different type of data loader? Any help will be appreciated.

Thank you,

In your DataLoader loop you are explicitly moving the tensors to self.device so I assume you are sticking to the standard workflow of loading the data to the host in order to avoid wasting GPU memory for the dataset (if it can even fit onto the GPU).

You can load the entire dataset in the Dataset.__init__ onto the GPU and then just index each sample in the __getitem__.