[SOLVED] Make Sure That Pytorch Using GPU To Compute

(Herleeyandi Markoni) #1

Hello I am new in pytorch. Now I am trying to run my network in GPU. Some of the articles recommend me to use torch.cuda.set_device(0) as long as my GPU ID is 0. However some articles also tell me to convert all of the computation to Cuda, so every operation should be followed by .cuda() . My questions are:
-) Is there any simple way to set mode of pytorch to GPU, without using .cuda() per instruction?, I just want to set all computation just in 1 GPU.
-) How to check and make sure that our network is running on GPU?, when I am using torch.cuda.set_device(0), I checked using nvidia-smi and I got 0% in volatile GPU. It is different when I am using Tensorflow or caffe which more than 10%. I am affraid that my pytorch still using CPU.
-Thank you-

(Hugh Perkins) #2

generally speaking, the pattern is:

  • use .cuda() on any input batches/tensors

  • use .cuda() on your network module, which will hold your network, like:

    class MyModel(nn.Module):
    def init(self):
    self.layer1 = nn. …
    self.layer2 = nn. …
    … etc …

then just do:

model = MyModel()

(SunYeop Lee) #3

How about using torch.set_default_tensor_type('torch.cuda.FloatTensor')?

(Rene Sandoval) #4

From the http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#autograd

tutorial it seems that the way they do to make sure everything is in cuda is to have a dytype for GPUs as in:

dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

and they have lines like:

# Randomly initialize weights
w1 = torch.randn(D_in, H).type(dtype)
w2 = torch.randn(H, D_out).type(dtype)

that way its seems possible to me that one can avoid the silly .cuda line everywhere in your code. Right? Im also new so Im checking with others.

(Herleeyandi Markoni) #5

Thanks everyone, your solutions are working well in my case. One reason which makes me really like pytorch is because of discussion forum. It help me a lot!

(Ismail Elezi) #6

In addition to what have been discussed so far, I found that adding this line of code:

cudnn.benchmark = True

before the training takes place, will improve the speed performance if you are using GPU(s).

(Ismail Elezi) #7


Do you know what is the case and is there ever a disadvantage on putting the flag on? And if there are no disadvantages, then why it isn’t the default flag on?

(Hugh Perkins) #8

This question sounds familiar somehow :slight_smile: . https://groups.google.com/forum/#!topic/torch7/CkB57025yRY

(Hugh Perkins) #9

Also note .fastest:

(Royi) #10

Is there a simple function which tests the GPU is configured correctly?
As I did what @hughperkins suggested on the following MNIST example:

Yet my system won’t run (Calculation will stop with error).

Thank You.

(Hugh Perkins) #11

@Royi I usually do the following, from bash:

python -c 'import torch; print(torch.rand(2,3).cuda())'

If the first fails, your drivers have some issue, or you dont have an (NVIDIA) GPU :stuck_out_tongue:

If the second fails, your pytorch instalaltion isnt able to contact the gpu for some reason (eg you didnt do conda install cuda80 -c soumith etc…)

(edit: if both the above succeed; I never saw any configuration error beyond that, other than my own coding error :stuck_out_tongue: BUT if you try to run on a V100, using cuda 8 pytorch, the second statement will hang for ~5 minutes, whilst it creates the cache. But it’ll do this each time, so it’s useless, and you’ll need to use cuda 9 pytorch, (or not use a V100))

(MirandaAgent) #12



(Iker Peng) #13

Hi when I try this codes, the second failed with the info. :Segmentation fault (core dumped). But when I add CUDA_VISIBLE_DEVICES=1 it works. only when I using CUDA_VISIBLE_DEVICES=0 it is failed. Can you PLS tell me why and give any suggestion.


I’m trying to implement the methods at the beginning of this thread as follows:

model = model.cuda()


import time
start = time.time()
train_loss = []
train_accu = []
i = 0
for epoch in range(20):
    for data, target in train_loader:
        data, target = (Variable(data).double()).cuda(), (Variable(target).long()).cuda()
        output = model(data.view(batch_size,1,64,64))
        loss = F.nll_loss(output, target) # Negative log likelihood (goes with softmax). 
        loss.backward()    # calc gradients
        train_loss.append(loss.data[0]) # Calculating the loss
        optimizer.step()   # update gradients
        prediction = output.data.max(1)[1]   # first column has actual prob.
        accuracy = (prediction.eq(target.data).sum()/batch_size)*100
        if i % 10 == 0:
            print('Epoch:',str(epoch),'Train Step: {}\tLoss: {:.3f}\tAccuracy: {:.3f}'.format(i, loss.data[0], accuracy))
        i += 1
end = time.time()
print('TRAIN TIME:')

But when I train, I just get a constant accuracy of 0%. Am I missing some part where I need to cast to .cuda() ?

(Justus Schock) #15

prediction.eq(target.data) returns a byte tensor/variable. Summing it up and dividing it by a batchsize would lead to zero.

Try it with

accuracy = (prediction.eq(target.data).float().sum()/batch_size)*100


Ah yes it would wouldn’t it! Worked beautifully, thanks!


Would it matter that I’ve called .cuda() on data before turning it into a variable or should i be doing Variable(data).double().cuda() ?

(Justus Schock) #18

This should both work equally good.
I would recommend switching to pytorch 0.4 as both classes are merged in this release