[SOLVED] Make Sure That Pytorch Using GPU To Compute

Hello I am new in pytorch. Now I am trying to run my network in GPU. Some of the articles recommend me to use torch.cuda.set_device(0) as long as my GPU ID is 0. However some articles also tell me to convert all of the computation to Cuda, so every operation should be followed by .cuda() . My questions are:
-) Is there any simple way to set mode of pytorch to GPU, without using .cuda() per instruction?, I just want to set all computation just in 1 GPU.
-) How to check and make sure that our network is running on GPU?, when I am using torch.cuda.set_device(0), I checked using nvidia-smi and I got 0% in volatile GPU. It is different when I am using Tensorflow or caffe which more than 10%. I am affraid that my pytorch still using CPU.
-Thank you-

10 Likes

generally speaking, the pattern is:

  • use .cuda() on any input batches/tensors

  • use .cuda() on your network module, which will hold your network, like:

    class MyModel(nn.Module):
    def init(self):
    self.layer1 = nn. …
    self.layer2 = nn. …
    … etc …

then just do:

model = MyModel()
model.cuda()
20 Likes

How about using torch.set_default_tensor_type('torch.cuda.FloatTensor')?

12 Likes

From the http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#autograd

tutorial it seems that the way they do to make sure everything is in cuda is to have a dytype for GPUs as in:

dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

and they have lines like:

# Randomly initialize weights
w1 = torch.randn(D_in, H).type(dtype)
w2 = torch.randn(H, D_out).type(dtype)

that way its seems possible to me that one can avoid the silly .cuda line everywhere in your code. Right? Im also new so Im checking with others.

8 Likes

Thanks everyone, your solutions are working well in my case. One reason which makes me really like pytorch is because of discussion forum. It help me a lot!

7 Likes

In addition to what have been discussed so far, I found that adding this line of code:

cudnn.benchmark = True

before the training takes place, will improve the speed performance if you are using GPU(s).

10 Likes

@hughperkins

Do you know what is the case and is there ever a disadvantage on putting the flag on? And if there are no disadvantages, then why it isn’t the default flag on?

This question sounds familiar somehow :slight_smile: . Redirecting to Google Groups

1 Like

Also note .fastest:

1 Like

Is there a simple function which tests the GPU is configured correctly?
As I did what @hughperkins suggested on the following MNIST example:

Yet my system won’t run (Calculation will stop with error).

Thank You.

@Royi I usually do the following, from bash:

nvidia-smi
python -c 'import torch; print(torch.rand(2,3).cuda())'

If the first fails, your drivers have some issue, or you dont have an (NVIDIA) GPU :stuck_out_tongue:

If the second fails, your pytorch instalaltion isnt able to contact the gpu for some reason (eg you didnt do conda install cuda80 -c soumith etc…)

(edit: if both the above succeed; I never saw any configuration error beyond that, other than my own coding error :stuck_out_tongue: BUT if you try to run on a V100, using cuda 8 pytorch, the second statement will hang for ~5 minutes, whilst it creates the cache. But it’ll do this each time, so it’s useless, and you’ll need to use cuda 9 pytorch, (or not use a V100))

8 Likes

try:

http://pytorch.org/docs/master/notes/cuda.html

Hi when I try this codes, the second failed with the info. :Segmentation fault (core dumped). But when I add CUDA_VISIBLE_DEVICES=1 it works. only when I using CUDA_VISIBLE_DEVICES=0 it is failed. Can you PLS tell me why and give any suggestion.

1 Like

I’m trying to implement the methods at the beginning of this thread as follows:

model = model.cuda()

torch.backends.cudnn.benchmark=True

import time
start = time.time()
model.train()
train_loss = []
train_accu = []
i = 0
for epoch in range(20):
    for data, target in train_loader:
        data, target = (Variable(data).double()).cuda(), (Variable(target).long()).cuda()
        optimizer.zero_grad()
        output = model(data.view(batch_size,1,64,64))
        loss = F.nll_loss(output, target) # Negative log likelihood (goes with softmax). 
        loss.backward()    # calc gradients
        train_loss.append(loss.data[0]) # Calculating the loss
        optimizer.step()   # update gradients
        prediction = output.data.max(1)[1]   # first column has actual prob.
        accuracy = (prediction.eq(target.data).sum()/batch_size)*100
        train_accu.append(accuracy)
        if i % 10 == 0:
            print('Epoch:',str(epoch),'Train Step: {}\tLoss: {:.3f}\tAccuracy: {:.3f}'.format(i, loss.data[0], accuracy))
        i += 1
end = time.time()
print('TRAIN TIME:')
print('%.2gs'%(end-start))

But when I train, I just get a constant accuracy of 0%. Am I missing some part where I need to cast to .cuda() ?

prediction.eq(target.data) returns a byte tensor/variable. Summing it up and dividing it by a batchsize would lead to zero.

Try it with

accuracy = (prediction.eq(target.data).float().sum()/batch_size)*100
2 Likes

Ah yes it would wouldn’t it! Worked beautifully, thanks!

Would it matter that I’ve called .cuda() on data before turning it into a variable or should i be doing Variable(data).double().cuda() ?

This should both work equally good.
I would recommend switching to pytorch 0.4 as both classes are merged in this release

hello dear i have the same issue. i don’t know how to solve it. could you help me please.

Hi,

I am struggling with running Pytorch on GPU. I created a simple fully connected network, set batch_size very large to make sure all data will be fed for the first time, and put my model, X and y to GPU using to('cuda'). The training takes long time comparing to Keras on GPU, and takes similar time to that if I set os.environ["CUDA_VISIBLE_DEVICES"]="-1" such that training will be run on CPU. I wonder if I miss any import step to run Pytorch on GPU.

In fact I observed timing difference for a CNN network - GPU runs faster than CPU. However, I cannot manage to realise it for a fully connected network. The size of the network won’t change the conclusion.

Is there any test code for a fully connected deep network running on GPU? All examples on the web that I can find are CNNs.