torch.cuda.streams.Stream in multiprocess context causes error: can't pickle Stream objects

Hello,I added data prefetching by using cuda stream like this:

class data_prefetcher():
    def __init__(self, loader):
        self.loader = iter(loader) = torch.cuda.Stream()
        self.mean = torch.tensor([0.485 * 255, 0.456 * 255, 0.406 * 255]).cuda().view(1,3,1,1)
        self.std = torch.tensor([0.229 * 255, 0.224 * 255, 0.225 * 255]).cuda().view(1,3,1,1)

    def preload(self):
            self.next_input, self.next_target = next(self.loader)
        except StopIteration:
            self.next_input = None
            self.next_target = None

            self.next_input = self.next_input.cuda(non_blocking=True)
            self.next_target = self.next_target.cuda(non_blocking=True)
            self.next_input = self.next_input.float()
            self.next_input = self.next_input.sub_(self.mean).div_(self.std)

    def next(self):
        input = self.next_input
        target = self.next_target
        if input is not None:
        if target is not None:
        return input, target

(come from

to training logic,it works fine in single GPU training,but when moving to multiprocess context,I got error like this:

  File "/usr/lib/python3.6/multiprocessing/", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.6/multiprocessing/", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.6/multiprocessing/", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.6/multiprocessing/", line 32, in __init__
  File "/usr/lib/python3.6/multiprocessing/", line 19, in __init__
  File "/usr/lib/python3.6/multiprocessing/", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.6/multiprocessing/", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle Stream objects

After debuging,I found torch.cuda.streams.Stream triggered this exception,Question are:

1,Isn’t it possible to use cuda stream in torch.multiprocess context?
2,If not,any examples?

Hi @Alex_Luya

If all you need is syncing streams across processes, you can use the ipc_handle() API to pass CUDA events across processes. See the example in the test.