I have been working on a pytorch model for one of my testing purposes. So far I trained the model using anaconda in windows and the model is working fine. I wanted to train the same model in raspberry pi 4 (aarch64). So I did the following,
- Deployed RaspbianLite OS
- Used torch wheels to install torch in Raspberry PI as it has ARM architecture ( got the wheels for torch(v1.7) and torchvision(v.0.8.1) from - https://mathinf.eu/pytorch/arm64/ )
Torch was imported successfully and I ran the model in raspberry pi without errors.
But the loss got a value of nan. So I debugged and found out that there was no issue with the data or data transformation (no nan inputs). But found out that F.log_softmax(r_out2, dim=1) returns a nan value from the beginning of first batch of data itself.
Anaconda environment versions -
numpy.version - 1.18.5
torch.version - 1.7.0
torchvision.version - 0.8.1
Raspberry Pi versions -
torch - v1.7
torchvision - v.0.8.1
numpy - v1.18.5 (tried upgrading to latest version as well but no luck)
I have no clue why the same model and same data returns nan, when it runs perfectly well in the anaconda environment. The only issue is with the log_softmax as of yet.
Does it have something to do with the torch wheels I installed ?
Or does it have to do something with the processing power on raspberry ?
I am very glad if someone can help me with this issue