Cifar10 Classifier Producing NaNs on my phone

So this is just for funsies, but I got pytorch built and running in UserLand on my Samsung Galaxy S9 (Arm64). It can even train a very very simple toy net that learns to multiply by 10.

However, when I train this:

I get NaNs after the first few iterations. I don’t get NaNs on this same code if I’m training on the CPU on my Windows10 dekstop (same code, same batch size, etc).

I’ve tried to put in batch norm layers after the convs and linear layers. Didn’t help. I tried not using CrossEntropyLoss and instead did a softmax + NLLLoss, also didn’t help.

Nothing is “crucial” about getting this to work, I just assumed CPU mode would give same results on two different systems. It would be interesting to know what the difference is. Maybe my pytorch build is broken in some non-obvious way?

Anyone tried this?