I have the following case that I want to add some loss on top of two DataParallel modules, and train m1 only:
m1 = nn.DataParallel(m1)
m2 = nn.DataParallel(m2)
m1_loss, m1_out = m1(input_data)
m2_out = m2(input_data)
then
added_loss = some_operation(m1_out, m2_out)
loss = m1_loss+added_loss
backprop()
some_operation involves conv layer, so how can I do this, if I add conv layer in some_operation without DataParallel, would this cause problem when backprop?