Different inference results when running on arm64 and x86_64

Klaus_Voltmer · May 7, 2020, 8:12am

Hi,

I get different inference results when running my c++ code on arm64 and x86_64. I build pytorch from source as described here https://pytorch.org/mobile/ios/#build-pytorch-ios-libraries-from-source

The results on x86_64 are actually correct and the results on arm64 are incorrect. I double checked in the python world.

I am stepping in the dark, so any hint is most appreciated!

LeviViana · May 7, 2020, 11:46am

can you provide the code ?

Klaus_Voltmer · May 7, 2020, 12:11pm

sure:

load the model:
module = torch::jit::load(fileName);
process:
torch::Tensor result;
std::vectortorch::jit::IValue inputs;
at::Tensor tensor = torch::from_blob(data.data(), {1, 1, 11, 144});
inputs.push_back(tensor);
torch::autograd::AutoGradMode guard( false );
result = module.forward(inputs).toTensor();

i have written unit tests and integrations tests that show that the results are equal to python. but once i run this on the iphone the results are different.

ptrblck · May 8, 2020, 3:51am

How are you creating the data and how large is the difference?

Klaus_Voltmer · May 8, 2020, 12:22pm

I am creating the data from a std::vector< float > . It contains 11 spectrums with the size of 144.

The difference is quite big, by something like factor of 2.

tom · May 8, 2020, 1:00pm

So arm64 and amd64 will use different backends. It is quite possible that you found a bug in the arm64 one, in particular if you use less-common modules. (e.g. I had that with transposed convs a year ago on arm32, where a network would run fine on amd64 but the output was messed up on my phone.)
I know it is a lot of work, but if you want, the ideal reproducing case would be to narrow down the network to a single module where things go wrong and then provide Module+Parameters and inputs. This would also limit how much you need to tell us about it.

Best regards

Thomas