Hi, everyone. I am working on a dynamic programing problem in which a nested neural network is implemented. For the sake of better understanding my question, simply assume the loss is
L = G(k’) - H(k’')
where G and H are two functions we do not need to know but variable k’ and k’’ stem from neural network NNs such that
k’ = NNs(k), k’’ = NNs(k’)
so k is the original input. Apparently, to obtain output k’’ the same neural network gets nested once:
k → NNs → k′ → NNs → k′′
My question is simple:
(1) how does backward propagation work in this nested neural network setting (one backpropagation for each transition I guess)? does it work the same as the regular one?
(2) could this nested NNs setup slow down the computation?
(3) what could go wrong under this nested NNs setup?