I’m running mnist example as same as python but it seems to be unstable.
- manual_seed might not work well
- after several epochs, loss and accuracy got wrong
- both CPU and GPU have this problem
- I’m using
libtorch
installed withconda install pytorch=1.0.0 -c pytorch
▶ ./mnist | grep Accuracy
Test set: Average loss: 0.098864, Accuracy: 0.9693
Test set: Average loss: 0.0540888, Accuracy: 0.9813
Test set: Average loss: 0.0359754, Accuracy: 0.9883
Test set: Average loss: 0.0464136, Accuracy: 0.9844
Test set: Average loss: nan, Accuracy: 0.098
Test set: Average loss: nan, Accuracy: 0.098
Test set: Average loss: nan, Accuracy: 0.098
Test set: Average loss: nan, Accuracy: 0.098
Test set: Average loss: nan, Accuracy: 0.098
▶ ./mnist | grep Accuracy
Test set: Average loss: 0.0982789, Accuracy: 0.9702
Test set: Average loss: 0.0553075, Accuracy: 0.9808
Test set: Average loss: 0.0357092, Accuracy: 0.9878
Test set: Average loss: 0.0512756, Accuracy: 0.9844
Test set: Average loss: 0.0566966, Accuracy: 0.9843
Test set: Average loss: nan, Accuracy: 0.098
Test set: Average loss: nan, Accuracy: 0.098
Test set: Average loss: nan, Accuracy: 0.098
Test set: Average loss: nan, Accuracy: 0.098
Test set: Average loss: nan, Accuracy: 0.098
▶ ./mnist | grep Accuracy
Test set: Average loss: 0.0987975, Accuracy: 0.9697
Test set: Average loss: 0.0572588, Accuracy: 0.9794
Test set: Average loss: 0.0364755, Accuracy: 0.9878
Test set: Average loss: 0.0520559, Accuracy: 0.983
Test set: Average loss: 2.30402, Accuracy: 0.1135
Test set: Average loss: 2.31102, Accuracy: 0.1135
Test set: Average loss: 2.33372, Accuracy: 0.1135
Test set: Average loss: 2.40776, Accuracy: 0.1135
Test set: Average loss: 2.6472, Accuracy: 0.1135
Test set: Average loss: 3.42565, Accuracy: 0.1135
you can see my code at