Thanks for the test.
In that case I would recommend to use a fixed input (sample the data once and save it) and then compare the outputs layer by layer using your model in your training and validation script.
Something apparently went wrong, if you are using the same preprocessing and the state_dict
was successfully loaded.