Gpu utilization too low - best practices to build an efficient datapipeline?

Hello,

I’m trying to run a segmentation task, following the below example notebook,
https://github.com/qubvel/segmentation_models.pytorch/blob/master/examples/cars%20segmentation%20(camvid).ipynb

The GPU utilization seems to be low, it fluctuates between (0% → 30% ->45%)
I tried to increase the batch_size that can fit in the memory and also try to increase the num_workers in DataLoader. Nothing seems to be helping, I see a lot of time in 0%, hence its using CPU more. What would be the best practice for data pipeline to improve GPU utilization in pytorch?

I’m also suspecting the lower GPU utilization could be because of data augmentation which might be happening on CPU instead of GPU.

Code segment from example notebook:

train_dataset = Dataset(
x_train_dir,
y_train_dir,
augmentation=get_training_augmentation(),
preprocessing=get_preprocessing(preprocessing_fn),
classes=CLASSES,
)

valid_dataset = Dataset(
x_valid_dir,
y_valid_dir,
augmentation=get_validation_augmentation(),
preprocessing=get_preprocessing(preprocessing_fn),
classes=CLASSES,
)

train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True, num_workers=12)
valid_loader = DataLoader(valid_dataset, batch_size=1, shuffle=False, num_workers=4)