Hello,
I have a non determinism issue while calling the torch.fft.rfft API between two different linux machines
first machine is :
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
Second machine is:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 112
On-line CPU(s) list: 0-111
Thread(s) per core: 2
Core(s) per socket: 28
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6238R CPU @ 2.20GHz
The python code that I’m running is bellow, I have two checksum functions to check the input and output tensor, when calling the rfft API.
When running this code on the two above machines, despite the same input checksum, I have different checksums at the output.
I am using pytorch version 1.12.0, and python 3.7, I tried with Float and Double tensors.
I also noticed the same issue on C++ code.
how could I resolve this non determinism issue ?
import struct
import torch.fft
import binascii
import numbers
import random
import numpy as np
np.random.seed(0)
random.seed(0)
torch.manual_seed(0)
torch.use_deterministic_algorithms(True)
def float_to_hex(f):
return hex(struct.unpack('<Q', struct.pack('<d', f))[0])
# Checksum function to be run on list
def getCheckSum(array):
c = 0
for x in array:
if x != 0.0:
c = binascii.crc32(binascii.a2b_hex(float_to_hex(x)[2:]), c)
return c
# Checksum function to be run on tensor
def getCheckSumT(tensor):
c = 0
tensor = tensor.view(tensor.numel())
for x in tensor:
if isinstance(x.item(), numbers.Complex):
value = x.item().real
if x.item().imag != 0.0:
c = binascii.crc32(binascii.a2b_hex(float_to_hex(x.item().imag)[2:]), c)
else:
value = x.item()
if value != 0.0:
c = binascii.crc32(binascii.a2b_hex(float_to_hex(value)[2:]), c)
return c
# Code testing the non determinism
array = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
x = np.random.random(len(array))/10e3
array = [x+y for x,y in zip(array, x)]
#print(array)
tensor = torch.DoubleTensor(array)
T2 = torch.fft.rfft(tensor)
print("pytorch version:", torch.__version__)
# Running the checksul on the input array, and on the tensor, expectation is to have the same checksum
ArrayCS = getCheckSum(array)
TensorCS = getCheckSumT(tensor)
assert(ArrayCS == TensorCS)
print("Array and Tensor checksums are the same:", ArrayCS)
print("Output tensor checksum:", getCheckSumT(T2))
print("Output tensor:", T2)