Hello, I want to train cascade separate model.
- train model A (no problem)
- train cascade model
(a) input->model A → output of A → model B → output of B
(b) some layer weights of A and B are shared.
(c) loss(output of A, target) + loss(output of B, target)
(d) the gradient should have flowed from B to A.
it results in gradient explosion.
I couldn’t find what causes this problem. I also found similar question. However, I didn’t get the answer. (Strategies to debug exploding gradients in pytorch - #7 by mangoxb)
I need your help or any advice.