Pytorch is not using GPU

Marion2 · October 2, 2019, 8:45pm

I try to run a PGGAN using 1 GPU but I can see that Pytorch is not using GPU and the usage of the CPU is very high whereas Tensorflow has no problem to use my GPU.

I am using Cuda 10 and Pytorch 10 so I don’t think there is a version compatibility issue.

When I do “torch.cuda.is_available()” it tells me “True” and I can see that Pytorch is able to find my GPU.

Therefore, I am wondering is there is an issue with the code ? Here are the two parts where the problem might be but I am not able to find it :

In the general settings, I put 1 to say 1 GPU even if the ID of my GPU is 0 :
parser.add_argument('--n_gpu', type=int, default=1) # for Multi-GPU training.

And also here :

class trainer:
    def __init__(self, config):
        self.config = config
        if torch.cuda.is_available():
            self.use_cuda = True
            torch.set_default_tensor_type('torch.cuda.FloatTensor')
        else:
            self.use_cuda = False
            torch.set_default_tensor_type('torch.FloatTensor')
        
        self.nz = config.nz
        self.optimizer = config.optimizer

        self.resl = 2           # we start from 2^2 = 4
        self.lr = config.lr
        self.eps_drift = config.eps_drift
        self.smoothing = config.smoothing
        self.max_resl = config.max_resl
        self.trns_tick = config.trns_tick
        self.stab_tick = config.stab_tick
        self.TICK = config.TICK
        self.globalIter = 0
        self.globalTick = 0
        self.kimgs = 0
        self.stack = 0
        self.epoch = 0
        self.fadein = {'gen':None, 'dis':None}
        self.complete = {'gen':0, 'dis':0}
        self.phase = 'init'
        self.flag_flush_gen = False
        self.flag_flush_dis = False
        self.flag_add_noise = self.config.flag_add_noise
        self.flag_add_drift = self.config.flag_add_drift
        
        # network and cirterion
        self.G = net.Generator(config)
        self.D = net.Discriminator(config)
        print ('Generator structure: ')
        print(self.G.model)
        print ('Discriminator structure: ')
        print(self.D.model)
        self.mse = torch.nn.MSELoss()
        if self.use_cuda:
            self.mse = self.mse.cuda()
            torch.cuda.manual_seed(config.random_seed)
            if config.n_gpu==1:
                self.G = torch.nn.DataParallel(self.G).cuda(device=0)
                self.D = torch.nn.DataParallel(self.D).cuda(device=0)
            else:
                gpus = []
                for i  in range(config.n_gpu):
                    gpus.append(i)
                self.G = torch.nn.DataParallel(self.G, device_ids=gpus).cuda()
                self.D = torch.nn.DataParallel(self.D, device_ids=gpus).cuda()

Last information : I am doing all of that on Windows.

Thank you a lot for your help, I am trying to figure out what’s wrong since weeks without finding the right answer.

ptrblck · October 2, 2019, 9:04pm

Could you use some random dummy data for training and check the GPU utilization?
If you see a higher GPU usage, you might have e.g. a data loading bottleneck, so that your GPU might be starving.

PS: I’m not a huge fan of setting the default tensor to a CUDATensor, as this creates an abstraction layer, where I’m not really sure, what’s going on under the hood.
If possible, create your model, data etc. and push them directly to the GPU via:

model = MyModel()
model.to('cuda')

for data, target in loader:
    data = data.to('cuda')
    target = target.to('cuda')
    output = model(data)

It’s a nitpick, but it makes the code much clearer in my opinion.

Marion2 · October 2, 2019, 9:33pm

Thanks for your answer and all of your advices ! I’ll test what you have said.

If it’s due to a data loading bottleneck, what would you recommand to do ?

ptrblck · October 2, 2019, 9:47pm

You could use this code from the ImageNet example to check, how long your training loop spends on the data loading part.

If you see a constant high number, you could use to increase / lower the number of workers etc. I assume you are already using a DataLoader. If not, have a look at this tutorial to see, how to use one.

I also recommend to read this post for a better understanding of potential data loading bottlenecks.

Marion2 · October 2, 2019, 10:03pm

Thanks a lot, I’ll read all of that !

Marion2 · October 9, 2019, 7:06am

I set up a lower batch size and number of workers and it worked… until I had a Cuda out of memory when the network was growing from 64 to 128. What do you think I should do ?
Thanks for your help !

ptrblck · October 9, 2019, 1:17pm

If PGGAN refers to “Progressive Growing of GANs” note that the author used 8x 16GB GPUs. If your GPUs have less memory, you won’t be able to train the model completely.
However, are you seeing an increasing memory usage during training? The mentioned step should come pretty at the beginning, so I just would like to make sure you are not storing the computation graphs unnecessarily.
What batch size are you currently using?