I’m building a network with a similar logic as here, but this thread doesn’t quite solve my issue.
My network (simple, fully connected) in bad pseudocode:
common_layer = f(input) branch1 = common_layer branch1 = f(branch1) branch2 = common_layer branch2 = f(branch2) output = a*branch1 + (1-a)*banch2
Is this construction guaranteed to backpropagate properly from output to branches to common layer? Do I need to make a deepcopy instead of just assigning
branch1 = common_layer and
branch2 = common_layer? Anything else that jumps out?
Network seems to be learning but if I add layers I see no real change in behavior which is making me think there is something funky with the branching and backprop.