Hello I am new in pytorch. Now I am trying to run my network in GPU. Some of the articles recommend me to use torch.cuda.set_device(0)
as long as my GPU ID is 0. However some articles also tell me to convert all of the computation to Cuda, so every operation should be followed by .cuda()
. My questions are:
-) Is there any simple way to set mode of pytorch to GPU, without using .cuda()
per instruction?, I just want to set all computation just in 1 GPU.
-) How to check and make sure that our network is running on GPU?, when I am using torch.cuda.set_device(0)
, I checked using nvidia-smi
and I got 0% in volatile GPU. It is different when I am using Tensorflow or caffe which more than 10%. I am affraid that my pytorch still using CPU.
-Thank you-
generally speaking, the pattern is:
-
use
.cuda()
on any input batches/tensors -
use
.cuda()
on your network module, which will hold your network, like:class MyModel(nn.Module):
def init(self):
self.layer1 = nn. …
self.layer2 = nn. …
… etc …
then just do:
model = MyModel()
model.cuda()
How about using torch.set_default_tensor_type('torch.cuda.FloatTensor')
?
From the http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#autograd
tutorial it seems that the way they do to make sure everything is in cuda is to have a dytype for GPUs as in:
dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU
and they have lines like:
# Randomly initialize weights
w1 = torch.randn(D_in, H).type(dtype)
w2 = torch.randn(H, D_out).type(dtype)
that way its seems possible to me that one can avoid the silly .cuda
line everywhere in your code. Right? Im also new so Im checking with others.
Thanks everyone, your solutions are working well in my case. One reason which makes me really like pytorch is because of discussion forum. It help me a lot!
In addition to what have been discussed so far, I found that adding this line of code:
cudnn.benchmark = True
before the training takes place, will improve the speed performance if you are using GPU(s).
Do you know what is the case and is there ever a disadvantage on putting the flag on? And if there are no disadvantages, then why it isn’t the default flag on?
This question sounds familiar somehow . Redirecting to Google Groups
Is there a simple function which tests the GPU is configured correctly?
As I did what @hughperkins suggested on the following MNIST example:
Yet my system won’t run (Calculation will stop with error).
Thank You.
@Royi I usually do the following, from bash:
nvidia-smi
python -c 'import torch; print(torch.rand(2,3).cuda())'
If the first fails, your drivers have some issue, or you dont have an (NVIDIA) GPU
If the second fails, your pytorch instalaltion isnt able to contact the gpu for some reason (eg you didnt do conda install cuda80 -c soumith
etc…)
(edit: if both the above succeed; I never saw any configuration error beyond that, other than my own coding error BUT if you try to run on a V100, using cuda 8 pytorch, the second statement will hang for ~5 minutes, whilst it creates the cache. But it’ll do this each time, so it’s useless, and you’ll need to use cuda 9 pytorch, (or not use a V100))
Hi when I try this codes, the second failed with the info. :Segmentation fault (core dumped). But when I add CUDA_VISIBLE_DEVICES=1 it works. only when I using CUDA_VISIBLE_DEVICES=0 it is failed. Can you PLS tell me why and give any suggestion.
I’m trying to implement the methods at the beginning of this thread as follows:
model = model.cuda()
torch.backends.cudnn.benchmark=True
import time
start = time.time()
model.train()
train_loss = []
train_accu = []
i = 0
for epoch in range(20):
for data, target in train_loader:
data, target = (Variable(data).double()).cuda(), (Variable(target).long()).cuda()
optimizer.zero_grad()
output = model(data.view(batch_size,1,64,64))
loss = F.nll_loss(output, target) # Negative log likelihood (goes with softmax).
loss.backward() # calc gradients
train_loss.append(loss.data[0]) # Calculating the loss
optimizer.step() # update gradients
prediction = output.data.max(1)[1] # first column has actual prob.
accuracy = (prediction.eq(target.data).sum()/batch_size)*100
train_accu.append(accuracy)
if i % 10 == 0:
print('Epoch:',str(epoch),'Train Step: {}\tLoss: {:.3f}\tAccuracy: {:.3f}'.format(i, loss.data[0], accuracy))
i += 1
end = time.time()
print('TRAIN TIME:')
print('%.2gs'%(end-start))
But when I train, I just get a constant accuracy of 0%. Am I missing some part where I need to cast to .cuda() ?
prediction.eq(target.data)
returns a byte tensor/variable. Summing it up and dividing it by a batchsize would lead to zero.
Try it with
accuracy = (prediction.eq(target.data).float().sum()/batch_size)*100
Ah yes it would wouldn’t it! Worked beautifully, thanks!
Would it matter that I’ve called .cuda() on data before turning it into a variable or should i be doing Variable(data).double().cuda() ?
This should both work equally good.
I would recommend switching to pytorch 0.4 as both classes are merged in this release
hello dear i have the same issue. i don’t know how to solve it. could you help me please.
Hi,
I am struggling with running Pytorch on GPU. I created a simple fully connected network, set batch_size very large to make sure all data will be fed for the first time, and put my model, X and y to GPU using to('cuda')
. The training takes long time comparing to Keras on GPU, and takes similar time to that if I set os.environ["CUDA_VISIBLE_DEVICES"]="-1"
such that training will be run on CPU. I wonder if I miss any import step to run Pytorch on GPU.
In fact I observed timing difference for a CNN network - GPU runs faster than CPU. However, I cannot manage to realise it for a fully connected network. The size of the network won’t change the conclusion.
Is there any test code for a fully connected deep network running on GPU? All examples on the web that I can find are CNNs.