Simple shallow neural network can't converge with 2-class mnist

Thanks man! You are awesome!!
And by the way, it seems like F.mse_loss has some bug that it won’t be divided by batch_size.