I am following the Hogwild example from multiprocessing best practices:
import torch.multiprocessing as mp
from model import MyModel
def train(model):
# Construct data_loader, optimizer, etc.
for data, labels in data_loader:
optimizer.zero_grad()
loss_fn(model(data), labels).backward()
optimizer.step() # This will update the shared parameters
if __name__ == '__main__':
num_processes = 4
model = MyModel()
# NOTE: this is required for the ``fork`` method to work
model.share_memory()
processes = []
for rank in range(num_processes):
p = mp.Process(target=train, args=(model,))
p.start()
processes.append(p)
for p in processes:
p.join()
and I’ve adapted it in the context of my reinforcement learning code. The way I construct my optimizer determines whether or not the model learns correctly.
What works: I define optim = Adam(model.parameters(), lr=.005)
in the main process and pass it into train
when creating the processes, i.e. p = mp.Process(target=train, args=(model, optim))
. Then, at the start of train
, I make a copy of optim
local to each subprocess, i.e. optim = deepcopy(optim)
.
What doesn’t work: I directly define optim = Adam(model.parameters(), lr=.005)
inside train
. Note: when I do this with SGD
, the model does learn!
Is there an obvious explanation as to why doesn’t Adam
work in the second approach?