Hi,
i’m facing the issue of low gpu utilisation (only around 15%) and high gpu memory utilisation with pytorch with Windows 10. I already tried to optimize the dataloader.
The Dataloader has the following structure: At first a list of the image paths is collected.
self.filenames = [os.path.join(dp, f) for dp, dn, fn in os.walk(os.path.expanduser(self.images_root)) for f in
fn if is_image(f)]
self.filenames.sort()
self.filenamesGt = [os.path.join(dp, f) for dp, dn, fn in os.walk(os.path.expanduser(self.labels_root)) for f in
fn if is_image(f)]
self.filenamesGt.sort()
Then the images are opend with PIL:
def __getitem__(self, idx):
# read input image
filename = self.filenames[idx]
filenameGt = self.filenamesGt[idx]
image_rgb = Image.open(filename)
image_Gt = Image.open(filenameGt)
Afterwards Torchvision.Transforms are applied with Conversion to ToTensor() at the end. Then the Torch Tensors are inputs to a function containing some OpenCV and numpy functions which produce the final input tensor that is returned by _getitem_
:
input_np = self.create_rgbdm(image_rgb.squeeze(0).numpy().transpose(1,2,0), image_Gt.squeeze(0).numpy())
input_tensor = transforms.ToTensor()(input_np)
The DataLoader arguments in the training script are the following:
train_data_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=16, shuffle=True,
num_workers=8, pin_memory=True)
The Problem is that changing num_workers from 0 to 2,4 or 8 does not decrease data loading time significantly and solve the gpu util problem. Is the custom preprocessing function self.create_rgbdm()
at the end the problem, when the DataLoader is running with multiple workers? Should the function be called outside of _getitem_
? Or could something else be the reason?
Pytorch Version: 1.5 with Cuda 10.1
GPU: Nvidia RTX 2080TI