[SOLVED] Make Sure That Pytorch Using GPU To Compute

herleeyandi · July 14, 2017, 12:02am

Hello I am new in pytorch. Now I am trying to run my network in GPU. Some of the articles recommend me to use torch.cuda.set_device(0) as long as my GPU ID is 0. However some articles also tell me to convert all of the computation to Cuda, so every operation should be followed by .cuda() . My questions are:
-) Is there any simple way to set mode of pytorch to GPU, without using .cuda() per instruction?, I just want to set all computation just in 1 GPU.
-) How to check and make sure that our network is running on GPU?, when I am using torch.cuda.set_device(0), I checked using nvidia-smi and I got 0% in volatile GPU. It is different when I am using Tensorflow or caffe which more than 10%. I am affraid that my pytorch still using CPU.
-Thank you-

hughperkins · July 14, 2017, 7:13am

generally speaking, the pattern is:

use .cuda() on any input batches/tensors
use .cuda() on your network module, which will hold your network, like:

class MyModel(nn.Module):
def init(self):
self.layer1 = nn. …
self.layer2 = nn. …
… etc …

then just do:

model = MyModel()
model.cuda()

qbx2 · July 14, 2017, 9:29am

How about using torch.set_default_tensor_type('torch.cuda.FloatTensor')?

pinocchio · July 14, 2017, 4:14pm

From the http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#autograd

tutorial it seems that the way they do to make sure everything is in cuda is to have a dytype for GPUs as in:

dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

and they have lines like:

# Randomly initialize weights
w1 = torch.randn(D_in, H).type(dtype)
w2 = torch.randn(H, D_out).type(dtype)

that way its seems possible to me that one can avoid the silly .cuda line everywhere in your code. Right? Im also new so Im checking with others.

herleeyandi · July 16, 2017, 5:28am

Thanks everyone, your solutions are working well in my case. One reason which makes me really like pytorch is because of discussion forum. It help me a lot!

Ismail_Elezi · July 18, 2017, 11:28am

In addition to what have been discussed so far, I found that adding this line of code:

cudnn.benchmark = True

before the training takes place, will improve the speed performance if you are using GPU(s).

Ismail_Elezi · July 18, 2017, 11:36am

@hughperkins

Do you know what is the case and is there ever a disadvantage on putting the flag on? And if there are no disadvantages, then why it isn’t the default flag on?

hughperkins · July 18, 2017, 11:39am

This question sounds familiar somehow . Redirecting to Google Groups

hughperkins · July 18, 2017, 11:40am

Also note .fastest:

Royi · November 10, 2017, 2:15pm

Is there a simple function which tests the GPU is configured correctly?
As I did what @hughperkins suggested on the following MNIST example:

github.com

rickiepark/pytorch-examples/blob/master/mnist.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### PyTorch MNIST Example similar to TensorFlow Tutorial"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Refer to https://www.tensorflow.org/versions/master/tutorials/mnist/pros/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [

This file has been truncated. show original

Yet my system won’t run (Calculation will stop with error).

Thank You.

hughperkins · November 11, 2017, 12:01pm

@Royi I usually do the following, from bash:

nvidia-smi
python -c 'import torch; print(torch.rand(2,3).cuda())'

If the first fails, your drivers have some issue, or you dont have an (NVIDIA) GPU

If the second fails, your pytorch instalaltion isnt able to contact the gpu for some reason (eg you didnt do conda install cuda80 -c soumith etc…)

(edit: if both the above succeed; I never saw any configuration error beyond that, other than my own coding error BUT if you try to run on a V100, using cuda 8 pytorch, the second statement will hang for ~5 minutes, whilst it creates the cache. But it’ll do this each time, so it’s useless, and you’ll need to use cuda 9 pytorch, (or not use a V100))

Brando_Miranda · March 5, 2018, 7:13pm

try:

http://pytorch.org/docs/master/notes/cuda.html

xiaoiker · March 13, 2018, 9:12am

Hi when I try this codes, the second failed with the info. :Segmentation fault (core dumped). But when I add CUDA_VISIBLE_DEVICES=1 it works. only when I using CUDA_VISIBLE_DEVICES=0 it is failed. Can you PLS tell me why and give any suggestion.

spacemeerkat · May 18, 2018, 3:30pm

I’m trying to implement the methods at the beginning of this thread as follows:

model = model.cuda()

torch.backends.cudnn.benchmark=True

import time
start = time.time()
model.train()
train_loss = []
train_accu = []
i = 0
for epoch in range(20):
    for data, target in train_loader:
        data, target = (Variable(data).double()).cuda(), (Variable(target).long()).cuda()
        optimizer.zero_grad()
        output = model(data.view(batch_size,1,64,64))
        loss = F.nll_loss(output, target) # Negative log likelihood (goes with softmax). 
        loss.backward()    # calc gradients
        train_loss.append(loss.data[0]) # Calculating the loss
        optimizer.step()   # update gradients
        prediction = output.data.max(1)[1]   # first column has actual prob.
        accuracy = (prediction.eq(target.data).sum()/batch_size)*100
        train_accu.append(accuracy)
        if i % 10 == 0:
            print('Epoch:',str(epoch),'Train Step: {}\tLoss: {:.3f}\tAccuracy: {:.3f}'.format(i, loss.data[0], accuracy))
        i += 1
end = time.time()
print('TRAIN TIME:')
print('%.2gs'%(end-start))

But when I train, I just get a constant accuracy of 0%. Am I missing some part where I need to cast to .cuda() ?

justusschock · May 18, 2018, 3:58pm

prediction.eq(target.data) returns a byte tensor/variable. Summing it up and dividing it by a batchsize would lead to zero.

Try it with

accuracy = (prediction.eq(target.data).float().sum()/batch_size)*100

spacemeerkat · May 18, 2018, 4:34pm

Ah yes it would wouldn’t it! Worked beautifully, thanks!

spacemeerkat · May 18, 2018, 4:41pm

Would it matter that I’ve called .cuda() on data before turning it into a variable or should i be doing Variable(data).double().cuda() ?

justusschock · May 18, 2018, 5:42pm

This should both work equally good.
I would recommend switching to pytorch 0.4 as both classes are merged in this release

Shariq_Ali · May 13, 2019, 10:32am

hello dear i have the same issue. i don’t know how to solve it. could you help me please.

Rui_Zhang · August 28, 2019, 3:02pm

Hi,

I am struggling with running Pytorch on GPU. I created a simple fully connected network, set batch_size very large to make sure all data will be fed for the first time, and put my model, X and y to GPU using to('cuda'). The training takes long time comparing to Keras on GPU, and takes similar time to that if I set os.environ["CUDA_VISIBLE_DEVICES"]="-1" such that training will be run on CPU. I wonder if I miss any import step to run Pytorch on GPU.

In fact I observed timing difference for a CNN network - GPU runs faster than CPU. However, I cannot manage to realise it for a fully connected network. The size of the network won’t change the conclusion.

Is there any test code for a fully connected deep network running on GPU? All examples on the web that I can find are CNNs.