Runtime error (77)


(Peter Ham) #1

I am trying to sum a Variable using torch.sum() on GPU and the following error happens:

 THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1501972792122/work/pytorch-0.1.12/torch/lib/THC/generated/../THCReduceAll.cuh line=334 error=77 : an illegal memory access was encountered

I searched and found some solutions such as https://github.com/torch/cutorch/issues/489 but none of them work. Any suggestions?


(Simon Wang) #2

Interesting! Do you have a repro script for us to debug?


(Peter Ham) #3

Thanks. Here is the code,

    loss = (target_value - out).pow(2).sum()

where target_value and out are both Variables on GPU.


(Peter Ham) #4

after upgrade to 0.2.0 the error still remains…

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu line=313 error=59 : device-side assert triggered

Segmentation fault (core dumped)


(Simon Wang) #5

This line of code has no issue. Could you post a reproducing script please? Thanks!


(Peter Ham) #6

what do you mean by reproducing script?


(Simon Wang) #7

I meant a code snippet that can be used to reproduce the error you are seeing.


(Peter Ham) #8

It seems like moving the code to another place and run again solves the problem …