Proper autograd with split network

I’m building a network with a similar logic as here, but this thread doesn’t quite solve my issue.

My network (simple, fully connected) in bad pseudocode:

common_layer = f(input)

branch1 = common_layer
branch1 = f(branch1)

branch2 = common_layer
branch2 = f(branch2)

output = a*branch1 + (1-a)*banch2

Is this construction guaranteed to backpropagate properly from output to branches to common layer? Do I need to make a deepcopy instead of just assigning branch1 = common_layer and branch2 = common_layer? Anything else that jumps out?

Network seems to be learning but if I add layers I see no real change in behavior which is making me think there is something funky with the branching and backprop.



That will work fine. You can directly do:

common_layer = f(input)

branch1 = f(common_layer)

branch2 = f(common_layer)

output = a*branch1 + (1-a)*banch2

As long as f is an nn.Module with no funky hooks. Or a simple python function.

1 Like