Conv2D on ARM device wrong result

I’m getting different results for a Conv2D operation on ARM and x86_64 devices. Basically, the output tensors are identical in some parts, but have major blocks of data different in other parts. A quick view of the differences I got: torch_nn_functional_conv2d_problem/difference.png at main · octavianmm/torch_nn_functional_conv2d_problem · GitHub
A minimal working example that replicates this problem can be found here: GitHub - octavianmm/torch_nn_functional_conv2d_problem: Different output on ARM and x86_64 architectures for torch.nn.functional.conv2d
Also, the repo contains the two tensors obtained on ARM and Intel platforms using the same code, so the differences can be easily inspected.
I’ve discovered this problem when debugging why the same CNN works as expected in inference on x86_64 machines, but gives bad results (always predicting the same class) on ARM devices.

Which PyTorch version are you using?
A NEON issue was fixed some time ago, so could you update to the latest release in case you are using an older build?

I’ve tested this on two ARM devices, with pytorch version 1.4.0 and 1.7.0, respectively, and the issue was present in both cases.
Do you know starting with which pytorch version was this issue fixed?

UPDATE: I’ve just tried pytorch 1.8.0 on the NVIDIA Jetson nano, got the same wrong result.

Thanks for the update. Could you create an issue on GitHub with all necessary information to reproduce and debug this issue further, please?

Thanks for the tip. I’ve created the issue here: Conv2D operation on ARM architecture gives wrong result · Issue #55781 · pytorch/pytorch · GitHub