Unable to Learn XOR Representation using 2 layers of Multi-Layered Perceptron (MLP)

Bobak_Farzin · February 9, 2018, 3:28pm

I tried out your code and got your results of non-convergence. I then upped the hidden dimension to 20, and I got convergence 100% of the time. Can you confirm that you see that also before we try to figure out why you can converge with a bigger one-layer hidden dimension? (I think it has to do with saddle points but first I want to know you get convergence with a bigger net.)

Also, you don’t need to try out so many losses and Activations. For this problem, BCELoss() and RELU will work just fine.