Hi, everyone. I am working on a dynamic programing problem in which a nested neural network is implemented. For the sake of better understanding my question, simply assume the loss is

L = G(k’) - H(k’')

where G and H are two functions we do not need to know but variable k’ and k’’ stem from neural network NNs such that

k’ = NNs(k), k’’ = NNs(k’)

so k is the original input. Apparently, to obtain output k’’ the same neural network gets nested once:

k → NNs → k′ → NNs → k′′

My question is simple:

(1) how does backward propagation work in this nested neural network setting (one backpropagation for each transition I guess)? does it work the same as the regular one?

(2) could this nested NNs setup slow down the computation?

(3) what could go wrong under this nested NNs setup?