PyTorch VERY different results on different machines using docker and CPU

I am trying to create some CI tests for a deep learning model and therefore created a Dockerfile to install PyTorch and all my requirements and finally run the tests.

FROM pytorch/pytorch
ADD . / project/
RUN (cd project/; pip install -r requirements.txt)
CMD ( cd project/; pytest -v --cov=my_project)

The tests are basically computing an image from 0 - 1 and comparing it to a reference image (saved as npy). The test is checking
if the avg L2 norm of the pixels is below a threshold of 1e-7.

diff_image = np.linalg.norm(target_image_np - reference_image_np, axis=2)
avg_error = np.mean(diff_image)
assert avg_error < 1e-7

The tests pass 12/15 test cases. However, 3 cases are failing quite badly.

=========================== short test summary info ============================
FAILED test_nst.py::test_nst_gatys - assert 0.0021541715 < 1e-07       
FAILED test_nst.py::test_nst_gatys_style - assert 0.12900369 < 1e-07
FAILED test_nst.py::test_nst_wct - assert 0.027357593 < 1e-07
=================== 3 failed, 12 passed in 670.27s (0:11:10) ===================

The weird thing is that this ONLY happens on the CI server. On my local machine, all tests pass. Does anybody have an idea why this is happening? As far as I know, using the CPU as well as fixed seeds should return at least results which differ only numerically.

Thanks for any feedback!

1 Like

We are experiencing a very similar issue:

1 Like

Thanks for your post I was not setting the number of threads. This fixed the issue for me. My final code looks like that:

    np.random.seed(42)
    torch.manual_seed(42)
    os.environ["PYTHONHASHSEED"] = "42"
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.set_num_threads(1)