Hello, I want to use the multiprocessing module to communicate autograd graphs to multiple nodes so that I can backpropagate the same problem samples at the same time using different loss functions.
Is there a recommended way to do this?
Could I pass the outputs and targets of the model into the args of mp.Process and then have each node perform criterion(outputs, targets).backward() ?
Right now I’m using this method and the first .backward() call works fine, but the second time I call .backward(), my program freezes.
Just a guess… Have you tried .backward(retain_graph=True)?
When the first process runs backward() it discards the autograd graph, so when the second process tries to run backward() it gets horribly confused.
Why not just run this in a single process? A lot of people do something like this…
loss = criterion1(outputs, targets) + ... + criterion_n(outputs, targets)
loss.backward()
But I imagine you want to do something different with each set of grads.
optimizer.zero_grad()
criterion1(outputs, targets).backward(retain_graph=True)
# do something with first set of grads
optimizer.zero_grad()
criterion2(outputs, targets).backward(retain_graph=True)
# do something with second set of grads
...
optimizer.zero_grad()
criterion_n(outputs, targets).backward()
# do something with nth set of grads