Cffi extension doesn't work on dataparallel in multi gpu

Hello. I’ve test my custom cffi extension on single gpu. However, when using dataParallel, there always goes a error:

THCudaCheck FAIL file=filter_kernel.cu line=61 error=77 : an illegal memory access was encountered
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THC/THCCachingHostAllocator.cpp line=258 error=77 : an illegal memory access was encountered

Does the problem occurs to anyone else?