For my research, I’m running a GAN on the deep learning online platform FloydHub.
My DCGAN-based model uses 64x64 images from a data set of >450k images, and my batch size is 128. The inputs to my generator and discriminator are constant (i.e. batches of 128 64x64 images). My local setup isn’t too powerful, and thankfully I have some research funds, so am trying to maximize the number of experiments I can run and thus am using the powerful but expensive Tesla V100 16GB GPU.
I do use
cuda, and the GPU is found:
self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") print("GPU/CPU:",torch.cuda.get_device_name(0)) # prints "GPU/CPU: GPU/CPU: Tesla V100-SXM2-16GB"
I set call
.to(device), the new
.cuda(), on my generator, discriminator, loss, noise and label tensors, as well as the fake and real images. When I checked,
cudnn.benchmark was by default
num_workers of the
When I print the weights of generator layer or a discriminator layer,
device='cuda:0' is included, so I know that these are using
However, my GPU usage hovers at around or below 40%, and a pass over the data set (3817 batches) takes >10 minutes. I’d like to run for 100 epochs, but I’d prefer it didn’t take so long! I figure I must be doing something wrong/forgetting to
cuda something (can’t think of what that would be though) or perhaps there’s some magic
cudnn flag I can set [although there doesn’t seem to be any documentation for
Any ideas? Or is this just how long training should take? Many thanks in advance.