import torch.multiprocessing as mp
from model import MyModel
def train(model):
# Construct data_loader, optimizer, etc.
for data, labels in data_loader:
optimizer.zero_grad()
loss_fn(model(data), labels).backward()
optimizer.step() # This will update the shared parameters
if __name__ == '__main__':
num_processes = 4
model = MyModel()
# NOTE: this is required for the ``fork`` method to work
model.share_memory()
processes = []
for rank in range(num_processes):
p = mp.Process(target=train, args=(model,))
p.start()
processes.append(p)
for p in processes:
p.join()
What exactly does shared memory mean??
Is a model created for each process??
Or is it that each process uses one model shared??
The Wikipedia article explains shared memory maybe a bit easier to understand.
It’s basically a memory pool, which can be used by multiple processes to exchange information and data.
tensor.share_memory_() will move the tensor data to shared memory on the host so that it can be shared between multiple processes. It is a no-op for CUDA tensors as described in the docs. I don’t quite understand the “in a single GPU instead of multiple GPUs” as this type of shared memory is not used on the GPU (i.e. it’s not the CUDA kernel-level shared memory).
If all processes are independent, e.g. each process is training an independent model and is not using model sharding, data parallel etc. then you should just launch your processes on the desired device. Since GPU resources will be shared between processes, you would most likely see a slowdown compared to a single process using a single GPU.
@ptrblck Hi, thank you for your kind reply. In my case, I need to calculate the Hessian matrix of loss w.r.t the weights of model for each data point (using torch.autograd.grad()). There is no need update weights. May I ask do I need to explicitly clone the models for each processes to make sure the gradients will not be shared?