i’m facing the issue of low gpu utilisation (only around 15%) and high gpu memory utilisation with pytorch with Windows 10. I already tried to optimize the dataloader.
The Dataloader has the following structure: At first a list of the image paths is collected.
self.filenames = [os.path.join(dp, f) for dp, dn, fn in os.walk(os.path.expanduser(self.images_root)) for f in fn if is_image(f)] self.filenames.sort() self.filenamesGt = [os.path.join(dp, f) for dp, dn, fn in os.walk(os.path.expanduser(self.labels_root)) for f in fn if is_image(f)] self.filenamesGt.sort()
Then the images are opend with PIL:
def __getitem__(self, idx): # read input image filename = self.filenames[idx] filenameGt = self.filenamesGt[idx] image_rgb = Image.open(filename) image_Gt = Image.open(filenameGt)
Afterwards Torchvision.Transforms are applied with Conversion to ToTensor() at the end. Then the Torch Tensors are inputs to a function containing some OpenCV and numpy functions which produce the final input tensor that is returned by
input_np = self.create_rgbdm(image_rgb.squeeze(0).numpy().transpose(1,2,0), image_Gt.squeeze(0).numpy()) input_tensor = transforms.ToTensor()(input_np)
The DataLoader arguments in the training script are the following:
train_data_loader = torch.utils.data.DataLoader( train_dataset, batch_size=16, shuffle=True, num_workers=8, pin_memory=True)
The Problem is that changing num_workers from 0 to 2,4 or 8 does not decrease data loading time significantly and solve the gpu util problem. Is the custom preprocessing function
self.create_rgbdm() at the end the problem, when the DataLoader is running with multiple workers? Should the function be called outside of
_getitem_ ? Or could something else be the reason?
Pytorch Version: 1.5 with Cuda 10.1
GPU: Nvidia RTX 2080TI