Reproducibility problem - not working as expected after 3 years

I have reproduced this experiment 2 years ago, that is partly hosted at the github link below.

At the moment, even the demo notebook does not work as expected.

To reproduce the issue:

  • clone the repository
  • run sample.ipynb
  • observe the “pred_prob” output.

The output of pred_prob is drastically different in the hosted notebook’s saved state than the one I run on my machine.

Expected result: tensor([[6.1537e-09, 1.0000e+00, 5.6260e-11, 2.5552e-15, 6.2870e-09, 4.2733e-11, .1202e-08]], device=‘cuda:0’)
What I get: tensor([[0.0037, 0.2303, 0.0036, 0.0065, 0.0014, 0.0768, 0.6778]], device=‘cuda:0’)

Requirements are nibabel, pandas, numpy and pytorch.

What I tried:
I created a new environment, tried combinations of previous cuda-pytorch combinations for 0.4.1 and 1.00 pytorch. Downgraded nibabel and numpy with pypi-timemachine at date 2018-12-12. But these didn’t work.

I can additionally say that, problem is not limited to loading the checkpoint. My code that used to work that runs from scratch does not work too.

Any advice is appreciated.