In my code, there’re many places where a variable is transfered to the GPU with .cuda() call like
x = x.cuda()$
When I begin the training, the program will always crash at some time, but at different such calls randomly.
One example is like this:
h = h.cuda() return CudaTransfer(device_id, async)(self) return i.cuda(async=self.async) return new_type(self.size()).copy_(self, async)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /data/users/soumith/miniconda2/conda-bld/pytorch-0.1.10_1488755368782/work/torch/lib/THC/generic/THCTensorCopy.c:18
I really cannot understand what is going on here.
I also tried to catch the exception and check the variable before the .cuda() call. It seems the variable is normal.
Anyone can help?