Clarify what subprocesses can do with torch.multiprocessing

Looking at the DataLoader sources, it seems that CPU->GPU data transfers are triggered within a thread instead of a process. Are there any restrictions against calling“cuda:0”) or Tensor.copy_() from within a worker process? What about Tensor.pin_memory()?

Bump (and I have edited the title for clarification).
It would be nice to have more documentation on this in case a custom dataloader is desired.