What happens when you call step() but the backprop is coming from a different network? The architecture from the 2 networks could be different. Here is some sample pseudocode
net_1 = Net_1()
net_2 = Net_2()
optimizer_1 = Optimizer(net_1.params)
optimizer_2 = Optimizer(net_2.params)
outputs = Net_1(inputs)
loss = CrossEntropyLoss(outputs,targets)
I backprop through net_1, but I am stepping with network 2.