Getting wrong outputs on TX2 with PyTorch compiled from source

Hello,

I trained a custom CNN and wanted to deploy it on the TX2. It works perfectly on 3 different systems (not ARM based). However, with the same weight file, input and source code, it gives outputs that are larger than expected.
Initially, I thought it was a version issue, however, I built another version from source (v0.3.0 and v0.3.1) and both give unexpected output values. Also, toggling the GPU-CPU flag does not help. I built the same version also on another system (from source) and it works fine.

Is there something that I am missing here?

Thanks for your time.