I recently read the DataLoader code, and have some question. I create a issue https://github.com/pytorch/pytorch/issues/33754
I get a reply:
You shouldn't do memory pinning in workers. It requires CUDA context, and using CUDA in multiprocessing is advised against. In particular, in fork, it does not work, as you observed. Using spawn would solve the CUDA initialization issue, but the tensor will need to be moved to shared memory for transfer, rendering the pinning useless. In general, you shouldn't need to speed up memory pinning, as the computation would be the major bottleneck, and multithreaded pinning should not be hurting you.
- I want to know what is pin_memory and shared memory? I run the code, when
pin_memory=Truewill occupy some GPU memory but little.
- I see the example for
In example exec
model.share_memory(), I want to know what it does. I usually don’t use it, but I do n’t find any problems
import torch.multiprocessing as mp from model import MyModel def train(model): # Construct data_loader, optimizer, etc. for data, labels in data_loader: optimizer.zero_grad() loss_fn(model(data), labels).backward() optimizer.step() # This will update the shared parameters if __name__ == '__main__': num_processes = 4 model = MyModel() # NOTE: this is required for the ``fork`` method to work model.share_memory() processes =  for rank in range(num_processes): p = mp.Process(target=train, args=(model,)) p.start() processes.append(p) for p in processes: p.join()
shared memorymentioned in reply, is it exec