You could register forward hooks in M2
and M3
in the penultimate layer as described here.
During the forward pass of these models the hooks will be activated and the outputs will be stored e.g. in a dict
. Once this is done you can pass the activations from the dict
to M5
and continue the training.
Depending if you want to calculate the gradients of the loss w.r.t. M2
and M3
you could either store the intermediate activations directly or detach()
them in the forward hook.