Best Practice Guide For GPU


Is there a best practice guide involving writing pytorch code for GPUs?

Like, things to avoid, things to try to do etc. which will help to improve speed for forward and backward passes?


There are two things which I found really useful.

  1. Using the pinned memory can help improve the training speed a lot if you are training the network with multiple GPUs.

  2. Enabling torch.backends.cudnn.benchmark would allow cuDNN to use faster algorithms for both forward and backward propagation but it may use more memory.

Thanks, but is there any other etiquette which is useful for speed?

For example, I noticed that the PyTorch documentation suggests to avoid in place operations