Hello, I want to use the multiprocessing module to communicate autograd graphs to multiple nodes so that I can backpropagate the same problem samples at the same time using different loss functions.
Is there a recommended way to do this?
Could I pass the outputs and targets of the model into the args of mp.Process and then have each node perform
criterion(outputs, targets).backward() ?
Right now I’m using this method and the first .backward() call works fine, but the second time I call .backward(), my program freezes.
Any suggestions or thoughts are welcome.
Just a guess… Have you tried
When the first process runs
backward() it discards the autograd graph, so when the second process tries to run
backward() it gets horribly confused.
Why not just run this in a single process? A lot of people do something like this…
loss = criterion1(outputs, targets) + ... + criterion_n(outputs, targets)
But I imagine you want to do something different with each set of grads.
# do something with first set of grads
# do something with second set of grads
# do something with nth set of grads