I am following the Hogwild example from multiprocessing best practices:
import torch.multiprocessing as mp from model import MyModel def train(model): # Construct data_loader, optimizer, etc. for data, labels in data_loader: optimizer.zero_grad() loss_fn(model(data), labels).backward() optimizer.step() # This will update the shared parameters if __name__ == '__main__': num_processes = 4 model = MyModel() # NOTE: this is required for the ``fork`` method to work model.share_memory() processes =  for rank in range(num_processes): p = mp.Process(target=train, args=(model,)) p.start() processes.append(p) for p in processes: p.join()
and I’ve adapted it in the context of my reinforcement learning code. The way I construct my optimizer determines whether or not the model learns correctly.
What works: I define
optim = Adam(model.parameters(), lr=.005) in the main process and pass it into
train when creating the processes, i.e.
p = mp.Process(target=train, args=(model, optim)). Then, at the start of
train, I make a copy of
optim local to each subprocess, i.e.
optim = deepcopy(optim).
What doesn’t work: I directly define
optim = Adam(model.parameters(), lr=.005) inside
train. Note: when I do this with
SGD, the model does learn!
Is there an obvious explanation as to why doesn’t
Adam work in the second approach?