How can I communicate autograd graphs between nodes?

Hello, I want to use the multiprocessing module to communicate autograd graphs to multiple nodes so that I can backpropagate the same problem samples at the same time using different loss functions.

Is there a recommended way to do this?

Could I pass the outputs and targets of the model into the args of mp.Process and then have each node perform criterion(outputs, targets).backward() ?

Right now I’m using this method and the first .backward() call works fine, but the second time I call .backward(), my program freezes.

Any suggestions or thoughts are welcome.

Thank you

Just a guess… Have you tried .backward(retain_graph=True)?

When the first process runs backward() it discards the autograd graph, so when the second process tries to run backward() it gets horribly confused.

Why not just run this in a single process? A lot of people do something like this…

loss = criterion1(outputs, targets) + ... + criterion_n(outputs, targets)
loss.backward()

But I imagine you want to do something different with each set of grads.

optimizer.zero_grad()
criterion1(outputs, targets).backward(retain_graph=True)
# do something with first set of grads

optimizer.zero_grad()
criterion2(outputs, targets).backward(retain_graph=True)
# do something with second set of grads
...
optimizer.zero_grad()
criterion_n(outputs, targets).backward()
# do something with nth set of grads