Training time too high when tensorflow code converted to pytorch

ptrblck · June 13, 2022, 8:02pm

Recreating new parameters in the forward pass (as done in decoder) wouldn’t make sense as they won’t be trained and their init might also create a performance penalty which could be avoided.
However, I would first recommend to make sure the models are actually the same as described in your cross-post as I’m unsure what the status is of this debugging effort.