RuntimeError: reduce failed to synchronize: an illegal memory access was encountered

When I run the pix2pix GAN which implemented by eriklindernoren in the Pytorch version 0.4.1, I got the RuntimeError as:

Exception ignored in: <bound method _DataLoaderIter.__del__ of < object at 0x7f0f8f5490b8>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/", line 399, in __del__
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/", line 378, in _shutdown_workers
  File "/usr/lib/python3.5/multiprocessing/", line 345, in get
    return ForkingPickler.loads(res)
  File "/usr/local/lib/python3.5/dist-packages/torch/multiprocessing/", line 151, in rebuild_storage_fd
    fd = df.detach()
  File "/usr/lib/python3.5/multiprocessing/", line 58, in detach
    return reduction.recv_handle(conn)
  File "/usr/lib/python3.5/multiprocessing/", line 181, in recv_handle
    return recvfds(s, 1)[0]
  File "/usr/lib/python3.5/multiprocessing/", line 152, in recvfds
    msg, ancdata, flags, addr = sock.recvmsg(1, socket.CMSG_LEN(bytes_size))
ConnectionResetError: [Errno 104] Connection reset by peer
Traceback (most recent call last):
  File "", line 141, in <module>
    loss_GAN = criterion_GAN(pred_fake, valid)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/", line 421, in forward
    return F.mse_loss(input, target, reduction=self.reduction)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/", line 1716, in mse_loss
    return _pointwise_loss(lambda a, b: (a - b) ** 2, torch._C._nn.mse_loss, input, target, reduction)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/", line 1674, in _pointwise_loss
    return lambd_optimized(input, target, reduction)
RuntimeError: reduce failed to synchronize: an illegal memory access was encountered

Why encounter the error? Anyone can help me?

Try running your script with CUDA_LAUNCH_BLOCKING=1. That should give a more accurate description of the error

Have your solved this problem? I met the same issue.

i met the same problem, I found that it is because my tensors are not on the same gpu and after i set them to be same gpu id, the error is gone

Try something like this

CUDA_ViSIBLE_DEVICES=1 python *.py