Hi, I am training a deep learning model that performs transformations on the fly. However, training is very slow and I am trying to figure out the bottleneck - Disk I/O, CPU or/and GPU.
I am running a 16CPU
, P100
GPU, 200gb
persistent disk on GCP.
This is the output of htop
This is the output of iotop
Can anyone help me understand what the bottleneck is?
From the htop output - CPU usage seems ~100%, and from the iotop
output I/O doesn’t seem to be the bottleneck.
I ran cprofile
on a smaller dataset. Dataloader
seems to be taking up 80% of the time.
So, what exactly is the bottleneck here?