How to share memory for Dataloader when using multiprocess?

I wrap my data with Dataset, then use Dataloader for enumerate. But because of copy-on-write mechanism, my memory goes so high out of expected.
My problem can be simplified as following:

class DataIter(Dataset):
        def __init__(self):
        = range(90317731) 
        def __len__(self):
                 return len(
        def __getitem__(self, idx):
                 return torch.Tensor([idx])

Then, i use Dataloader and for-loop to fetch data.

train_data = DataIter()
train_loader = DataLoader(train_data, batch_size=64,
                                   shuffle=True, num_workers=8)

for i, item in enumerate(train_loader):

After it is running , i watch my memory(RAM, RSS). It costs about 20GB RSS due to copy-on-write in subprocess. How to deal with it? = range(90317731) should cost about 2~3GB using python list. I know using Numpy can reduce symptom, it reduces the train_data size, so the subprocess copies less.

Summarize my problems:

  • How to reduce the memory cost by subprocess due to copy-on-write? Using Manager or something else?

  • Pytorch has only considered about the Tensor shared memory, but without DataSet class? am I right?

I’m using Python2.7