Hi, Thanks for amazing work. I am using RTX3090Ti GPU, and I am training on 1M images. My system GPU utilization is fluctuating and it’s not very fast as it should be. We trained with a small dataset with 8k images, the GPU utilization was stable, however, it is not stable when trained with 1M images. It is constantly fluctuating between 0 - 98% and slow. We varied the batch size and number of workers. My system has the following specs:
GPU: RTX3090TI
CUDA: 11.6
pytorch: 1.12.1 stable
YOLOv5: yolov5m
nproc: 16
could you please suggest some promising directions to fix this issue.
“it is constantly fluctuating between 0 - 98% and slow” maybe you are reading data too slowly and thus your GPU waiting for new data most of the time ?
Try to see the time needed by your GPU to process a batch and the time needed by your dataloader to give a new batch ?
If you have 1M images maybe you should use a format adapted for big image datatasets like h5 files