dvirginz
(Dvir Ginzburg)
September 20, 2020, 8:57am
1
GPU 2080TI
Driver 430.64
cuda 10.1
pytorch 1.6.0
The following error occurs on the following lines of code, and I could not recreate it artificially
l = torch.from_numpy(model.train_dataset.template.Laplacian)
a = torch.from_numpy(model.train_dataset.template.A)
evec = template_evecs[0][:,1]
evals = torch.from_numpy(model.train_dataset.template.evals)
eval1 = evals[1]
res = l.matmul(evec) - eval1 * a.matmul(evec)
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1595629403081/work/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=700 : an illegal memory access was encountered
Unfortunately, If I transfer the tensors to .cpu()
first, the answer is correct and no errors.
Ideas?
Dexter
September 20, 2020, 9:04am
2
My guess is, this happens because problem happens in memory allocation in GPU for complete data insertion. Have you tried it doing in batches and checking whether the same happens for batches?
dvirginz
(Dvir Ginzburg)
September 20, 2020, 10:00am
3
Dexter:
er the same
I think I don’t understand you:/ it’s not a complete data insertion rather loading precomputed numpy arrays and moving them to the gpu
ptrblck
September 22, 2020, 5:54am
4
Could you rerun your code with CUDA_LAUNCH_BLOCKING=1 python script.py args
and check the stack trace as well as the line of code, which creates this issue?
dvirginz
(Dvir Ginzburg)
September 24, 2020, 7:23am
5
I’m sorry ptrblck, but I cannot repreduce anymore, super strange.
Thanks for helping:)