GPU not working

Hello,

I’m training a CNN using the same example code as https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/02-intermediate/convolutional_neural_network/main-gpu.py

However, when I run the code on the AWS instance, my CPUs run on 100% but the GPU shows 0% (although the GPU memory increases and shows a process running)

54 AM

Any idea why this might be happening? I’d appreciate any insight.

Thanks!

In the whole period, the util is 0?

At first it goes up to 5% or 6% then it’s 0%

Try set batch_size to 1000, 2000 or even larger.

Try setting num_workers to something > 0 on data loader.

I tried this, no changes.

If the num_workers is anything > 0 I’m getting the following error:
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Usually you want data loader to give CPU tensors with pin_memory=True, and load them to GPU before training.

Yes exactly, inside my data loader I’m loading the image and returning it along with the label as torch.FloatTensor. Then in the training I’m calling .cuda() on both the images and the labels before passing them to the model.

Is this what you mean? Do you have a better approach?

Thanks for the help!

You are right. Sorry I misread the code.

I am running your original script on my box and seeing ~25% GPU usage constantly during training.

Here is a gist that shows how to add num_workers and pin_memory properly. Note that you need to wrap things in if __name__ == '__main__': for multiprocessing to work. It doubles my GPU usage. So you can try this as well.

1 Like