Pytorch 0.3.0, shared tensor cannot be read in torch.multiprocessing

zhiweifang · December 11, 2017, 11:04am

Hi,

I installed Pytorch 0.3.0 with latest Anaconda3-5.0.1-Linux-x86_64(python 3.6) by

conda install pytorch torchvision -c pytorch

When I try to read from a shared tensor, it is blocked.

import torch
import torch.multiprocessing as mp
from torch.multiprocessing import Process
import numpy as np

class MyProcess(Process):
  def __init__(self):
    super(MyProcess, self).__init__()
    self.batch_size = 256
    self.N = 443757
    self.idx = np.random.randint(0, self.N, size=(self.batch_size,)).tolist()
    self.a = torch.zeros([self.N, 3000])
    self.a.share_memory_()
    print('reading in main process ...')
    data = self.a[self.idx, :]
    print('done!')
    self.lock = mp.Lock()
  def run(self):
    with self.lock:
      print('reading in sub-process ...')
      data = self.a[self.idx, :] # blocked！！
      print('done!')
my_process = MyProcess()
my_process.start()
my_process.join()

The output is

reading in main process ...
done!
reading in sub-process ...

But it goes well in Pytorch0.2.0:

reading in main process ...
done!
reading in sub-process ...
done!

I also noticed that when self.batch_size is a small number (e.g. 8), it can works well in both Pytorch0.3.0 and Pytorch0.2.0.

Any suggestions about this issues would be appreciated.