However, when I run the code on the AWS instance, my CPUs run on 100% but the GPU shows 0% (although the GPU memory increases and shows a process running)
Any idea why this might be happening? I’d appreciate any insight.
If the num_workers is anything > 0 I’m getting the following error: RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Yes exactly, inside my data loader I’m loading the image and returning it along with the label as torch.FloatTensor. Then in the training I’m calling .cuda() on both the images and the labels before passing them to the model.
Is this what you mean? Do you have a better approach?
I am running your original script on my box and seeing ~25% GPU usage constantly during training.
Here is a gist that shows how to add num_workers and pin_memory properly. Note that you need to wrap things in if __name__ == '__main__': for multiprocessing to work. It doubles my GPU usage. So you can try this as well.