Training cascade model

mangoxb February 16, 2023, 6:13am 1

Hello, I want to train cascade separate model.

train model A (no problem)
train cascade model
(a) input->model A → output of A → model B → output of B
(b) some layer weights of A and B are shared.
(c) loss(output of A, target) + loss(output of B, target)
(d) the gradient should have flowed from B to A.
it results in gradient explosion.

I couldn’t find what causes this problem. I also found similar question. However, I didn’t get the answer. (Strategies to debug exploding gradients in pytorch - #7 by mangoxb)
I need your help or any advice.