I am trying to run simple regression example with pytorch multiprocessing. I am following the example here: Multiprocessing best practices — PyTorch 1.10.0 documentation
However few things are unclear to me. In the example page it is written:
for data, labels in data_loader:
optimizer.zero_grad()
loss_fn(model(data), labels).backward()
optimizer.step() # This will update the shared parameters
- What is meant by update the shared parameters?
- Do they update on individual processes? or as the model is shared (
model.shared_memory()
achieves that right?) it updates the shared copy of parameters? - By default model parameters must be shared right? as it says in the note
If torch.Tensor.grad is not None, it is also shared.
- If it updates the shared copy, then is the backwards call calculating loss over single thread or sum over losses from all threads? If it is total loss then how can I print it?
- If the model contain a layers which has conditional branches, how will optimizer update parameters?
- Each process saves its own backward tree? If yes then how are parameters updated?
Sorry for the question dump but couldn’t find proper answers as tutorials and documentation on this was rather scarce.