Make values consistent across versions

I think a full comparison of the forward pass would be useful.
To do so, you could use forward hooks (as explained here) to save all intermediate output tensors.
Once the hooks are working, use a constant tensor (e.g. all ones) for both models, store all activations, and compare them.