What is pin_memory and shared memory

I recently read the DataLoader code, and have some question. I create a issue https://github.com/pytorch/pytorch/issues/33754

I get a reply:

You shouldn't do memory pinning in workers. It requires CUDA context, and using CUDA in multiprocessing is advised against. In particular, in fork, it does not work, as you observed. Using spawn would solve the CUDA initialization issue, but the tensor will need to be moved to shared memory for transfer, rendering the pinning useless.

In general, you shouldn't need to speed up memory pinning, as the computation would be the major bottleneck, and multithreaded pinning should not be hurting you.
  1. I want to know what is pin_memory and shared memory? I run the code, when pin_memory=True will occupy some GPU memory but little.
  2. I see the example for torch.multiprocessing in https://pytorch.org/docs/stable/notes/multiprocessing.html
    In example exec model.share_memory(), I want to know what it does. I usually don’t use it, but I do n’t find any problems
import torch.multiprocessing as mp
from model import MyModel

def train(model):
    # Construct data_loader, optimizer, etc.
    for data, labels in data_loader:
        optimizer.zero_grad()
        loss_fn(model(data), labels).backward()
        optimizer.step()  # This will update the shared parameters

if __name__ == '__main__':
    num_processes = 4
    model = MyModel()
    # NOTE: this is required for the ``fork`` method to work
    model.share_memory()
    processes = []
    for rank in range(num_processes):
        p = mp.Process(target=train, args=(model,))
        p.start()
        processes.append(p)
    for p in processes:
        p.join()
  1. shared memory mentioned in reply, is it exec share_memory()?

I try use Process not spawn , why the pin_memory not report an error?

import time
from torch.multiprocessing import Process

from image_dataset import ImageDataset
from torch.utils.data import DataLoader


def image_reader():
  pytorch_loader = DataLoader(dataset=ImageDataset("./coco/train"),
                              batch_size=64,
                              shuffle=False,
                              sampler=None,
                              batch_sampler=None,
                              num_workers=4,
                              collate_fn=None,
                              pin_memory=True,
                              drop_last=False,
                              timeout=0,
                              worker_init_fn=None,
                              multiprocessing_context=None)

  read_time = 0
  stime = time.time()
  for _ in pytorch_loader:
    read_time = read_time + 1
    if read_time % 100 == 0:
      etime = time.time()
      print("read time: {}, cost time: {}".format(read_time, etime - stime))
      stime = etime


def run():
  plist = []
  for j in range(5):
    p = Process(target=image_reader)
    p.start()
    plist.append(p)

  for p in plist:
    p.join()


if __name__ == "__main__":
  run()
  1. Have a look at this blog post to see an explanation of pinned memory.

  2. Shared memory can be used for Inter Process Communication. (explained here)