Runtime error occurs when using .cuda(1)

big_tree · February 10, 2017, 8:21am

Hi all,

I try to use pytorch on the 2nd GPU,

`a = torch.ones(1).cuda(1)
 b = torch.ones(1).cuda(1)
 c = torch.cat((a,b),0)`

Then an error comes out:

RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /data/users/soumith/miniconda2/conda-bld/pytorch-0.1.7_1485444530918/work/torch/lib/THC/generic/THCTensorCopy.c:65

How can I fix this?

big_tree · February 10, 2017, 12:30pm

In addition, how to set learning rates for different layers?
I think use
for param_group in optimizer.state_dict()['param_groups']: param_group['lr'] = lr
can only set the learning rate for hole model.

apaszke · February 10, 2017, 2:14pm

Yes, we’re aware of the bug in cat. It will be fixed during the weekend.

About the second question, see the per-parameter options section of the optim docs.

big_tree · February 11, 2017, 8:00am

@apaszke Thank you so much!

WarBean · February 14, 2017, 3:36am

Encounter the same problem. I have just updated to the latest version but the error sitll rises. Has it been fixed? If not, is there any workaround?

WarBean · February 14, 2017, 4:11am

Whatsmore, this error rises only when I use GPU 1,2,3 on my PC. No error rises if I use gpu 0. Not sure whether this is related to the issue: torch.cat puts result on current GPU rather than GPU of inputs.

For now, it seems that I can workaround by using GPU 0.

apaszke · February 14, 2017, 8:40am

A temporary workaround is to wrap the torch.cat calls in with torch.cuda.device_if(tensor) where tensor can be e.g. the first element of the catted sequence. A fix is waitining in this PR.