- Illegal memory access was encountered - while no error when using cpu

GPU 2080TI
Driver 430.64
cuda 10.1
pytorch 1.6.0

The following error occurs on the following lines of code, and I could not recreate it artificially

    l = torch.from_numpy(model.train_dataset.template.Laplacian)
    a = torch.from_numpy(model.train_dataset.template.A)
    evec = template_evecs[0][:,1]
    evals = torch.from_numpy(model.train_dataset.template.evals)
    eval1 = evals[1]
    res = l.matmul(evec) - eval1 * a.matmul(evec)

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1595629403081/work/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=700 : an illegal memory access was encountered

Unfortunately, If I transfer the tensors to .cpu() first, the answer is correct and no errors.

My guess is, this happens because problem happens in memory allocation in GPU for complete data insertion. Have you tried it doing in batches and checking whether the same happens for batches?

I think I don’t understand you:/ it’s not a complete data insertion rather loading precomputed numpy arrays and moving them to the gpu

Could you rerun your code with CUDA_LAUNCH_BLOCKING=1 python script.py args and check the stack trace as well as the line of code, which creates this issue?

I’m sorry ptrblck, but I cannot repreduce anymore, super strange.
Thanks for helping:)