Hello,
I am trying to have reproducible calculation with pytorch across multiple computers and I managed to do almost that except that I did not manage to have the same results on a computer that support AVX512 and one that does not.
Basically, if I try to execute the following code
import numpy as np
import torch
import torch.nn as nn
import random
torch.use_deterministic_algorithms(True)
seed = 42
random.seed(seed)
torch.manual_seed(seed)
np.random.seed(seed)
device = torch.device("cpu")
def layer_init(layer, std=np.sqrt(2), bias_const=0.0):
torch.nn.init.orthogonal_(layer.weight, std)
torch.nn.init.constant_(layer.bias, bias_const)
return layer
network = nn.Sequential(layer_init(nn.Linear(100, 100)),
layer_init(nn.Linear(100,1), std=1.0),).to(device)
with torch.no_grad():
action = network(torch.rand(size=(100,)).to(device))
print(format(action.cpu().numpy()[0].item(), '.60g'))
Result on AVX512 computer is 0.4804637432098388671875
on non-AVX512
computer I get 0.4804628789424896240234375
, a difference of order 1e-5.
I understand that this difference is expected due to floating point calculation, my problem is that there should be 0 difference when specifying that AVX512 not be used (i.e. using ATEN_CPU_CAPABILITY=avx2
), if the accelerations used are the same on both computers there should not be any difference!
However, even when asking to use only AVX2 the two computers still give different results. Do you have any idea on how to not use AVX512 on an AVX512 compatible computer in order to be reproducible?
This is a continuation of this github issue in which I already explained my problem and was advised to comehere.
Thanks