Non-Deterministic Behaviour on same GPU and same Docker Image?

Hi!

I am trying to achieve deterministic behavior on the same GPU and same Docker image across different compute instance for the same torchscript file. However, the issue is that I am getting different results depending on the server instance? Note that the Docker image and GPU is the same and the underlying OS is the same. I have turned off all possible sources of determinism:

    at::globalContext().setDeterministicAlgorithms(true);
    at::globalContext().setDeterministicCuDNN(true);
    at::globalContext().setBenchmarkCuDNN(false);
    at::manual_seed(0);
    torch::jit::setGraphExecutorOptimize(false);

and even all runtime optimizations. I am getting consistent results on the same compute instance but different results on different computer instances. Note that the docker image and CUDA hardware is the same.

Thanks