I recently read the DataLoader code, and have some question. I create a issue https://github.com/pytorch/pytorch/issues/33754
I get a reply:
You shouldn't do memory pinning in workers. It requires CUDA context, and using CUDA in multiprocessing is advised against. In particular, in fork, it does not work, as you observed. Using spawn would solve the CUDA initialization issue, but the tensor will need to be moved to shared memory for transfer, rendering the pinning useless.
In general, you shouldn't need to speed up memory pinning, as the computation would be the major bottleneck, and multithreaded pinning should not be hurting you.
- I want to know what is pin_memory and shared memory? I run the code, when
pin_memory=True
will occupy some GPU memory but little. - I see the example for
torch.multiprocessing
in https://pytorch.org/docs/stable/notes/multiprocessing.html
In example execmodel.share_memory()
, I want to know what it does. I usually don’t use it, but I do n’t find any problems
import torch.multiprocessing as mp
from model import MyModel
def train(model):
# Construct data_loader, optimizer, etc.
for data, labels in data_loader:
optimizer.zero_grad()
loss_fn(model(data), labels).backward()
optimizer.step() # This will update the shared parameters
if __name__ == '__main__':
num_processes = 4
model = MyModel()
# NOTE: this is required for the ``fork`` method to work
model.share_memory()
processes = []
for rank in range(num_processes):
p = mp.Process(target=train, args=(model,))
p.start()
processes.append(p)
for p in processes:
p.join()
-
shared memory
mentioned in reply, is it execshare_memory()
?