I ran the resnet18 imagenet example on an AWS p3.2xlarge on the official amazon AMI for deep learning, and I’m finding that the data-loader is a bottleneck, evidenced both by a lack of constant GPU utilization and also the fact that when allowing the training loop to run with a constant data (no iteration through the dataloader), the time to train an epoch is cut by nearly 50% (and full GPU utilization). Has anyone else experienced this? If you want to replicate the issue exactly, replace data loader loop with a “for i in range(len(dataloader))” and set data to be some constant batch of data so that no data is actually required to be read from disk.
Hi,have you sovled the problem? I met this too.
So the issue is that the p3 has insufficient compute to match the gpu - you need this compute for loading images from disk, and for transforming each image. Some solutions - change the loading format to something that loads faster - like hdf5 . I haven’t tested this but I know that loading is a huge bottle neck. You might also notice that the problem goes away if yo use a larger model- basically the gpu becomes the bottleneck as desired. What we’re really after tho is cost effective training and not being wasteful- which in some cases the g3 might be a better option since it has a higher ratio of compute to gpu. You can also experiment with using less transforms, or trying to implement transforms that would normally be implemented by the cpu in the model itself so that it’s done by the gpu. There also might be opportunities to improve the efficiency of transforms. Lastly, setting num workers to 8 helped me