I am following the Hogwild example from multiprocessing best practices:
import torch.multiprocessing as mp
from model import MyModel
def train(model):
# Construct data_loader, optimizer, etc.
for data, labels in data_loader:
optimizer.zero_grad()
loss_fn(model(data), labels).backward()
optimizer.step() # This will update the shared parameters
if __name__ == '__main__':
num_processes = 4
model = MyModel()
# NOTE: this is required for the ``fork`` method to work
model.share_memory()
processes = []
for rank in range(num_processes):
p = mp.Process(target=train, args=(model,))
p.start()
processes.append(p)
for p in processes:
p.join()
and I’ve adapted it in the context of my reinforcement learning code. The way I construct my optimizer determines whether or not the model learns correctly.
What works: I define optim = Adam(model.parameters(), lr=.005) in the main process and pass it into train when creating the processes, i.e. p = mp.Process(target=train, args=(model, optim)). Then, at the start of train, I make a copy of optim local to each subprocess, i.e. optim = deepcopy(optim).
What doesn’t work: I directly define optim = Adam(model.parameters(), lr=.005) inside train. Note: when I do this with SGD, the model does learn!
Is there an obvious explanation as to why doesn’t Adam work in the second approach?