Training cascade model

Hello, I want to train cascade separate model.

  1. train model A (no problem)
  2. train cascade model
    (a) input->model A → output of A → model B → output of B
    (b) some layer weights of A and B are shared.
    (c) loss(output of A, target) + loss(output of B, target)
    (d) the gradient should have flowed from B to A.
    it results in gradient explosion.

I couldn’t find what causes this problem. I also found similar question. However, I didn’t get the answer. (Strategies to debug exploding gradients in pytorch - #7 by mangoxb)
I need your help or any advice.