Reproducibility in Vanilla Pytorch

I am getting different results for implementations in lightning and in vanilla pytorch, which to my understanding need to be equivalent, and I’d like to find out why. I have taken the code in From Pytroch to Pytorch Lightning, specifically the code under “full training loop for pytorch” and “full training loop for lightning”, and introduced the following changes:

  1. for both: added pl.seed_everything(42) right after the imports.
  2. for lightning: disabled sanity check in the trainer.
  3. for lightning: replaced self.forward calls with self.

However, both codes don’t operate identically. Does anybody know why?

Pl does not really modify your code.
Make sure you use the test_step (which uses torch.nograd and freezers layers such as batchnorm or dropout) in pl.
Also it could be about multiprocessing in the dataloader as there are reported issues (seeds are copied and each worker return exactly the same transforms).

Besides that, check you can reproduce your own model, as I mentioned, pl doesn’t really change the model or the underlying tools.

Thank you. I will try your suggestions and report back.