Recreating new parameters in the forward pass (as done in decoder
) wouldn’t make sense as they won’t be trained and their init might also create a performance penalty which could be avoided.
However, I would first recommend to make sure the models are actually the same as described in your cross-post as I’m unsure what the status is of this debugging effort.