Training different stage of model with different loss

You are most likely running into this issue which fails to compute the gradients since the forward activations are stale after a parameter update.

1 Like