Simple Linear/Sigmoid Network not learning

KFrank · October 9, 2020, 12:15am

Hello Samy!

I think it is rather difficult to train a small network to reproduce an XOR
gate.

XOR has cross-terms that are, in some sense, highly non-linear. This
is an imperfect fit for a “narrow,” “shallow” network.

Please take a look at this thread about a nearly identical problem:

Some further comments in line:

Continue to experiment with learning rates and many random
initializations, and continue to train for many iterations. It looks
like it is easy for the training of this kind of network to get stuck
away from a good solution.

Play around with making your network deeper by adding one or
more hidden layers, e.g., something like:

            self.x1 = nn.Linear(2,4)
            self.x2 = nn.Linear(4,2)
            self.x3 = nn.Linear(2,1)

or:

            self.x1 = nn.Linear(2,4)
            self.x2 = nn.Linear(4,4)
            self.x3 = nn.Linear(4,2)
            self.x4 = nn.Linear(2,1)

Try making your network wider:

            self.x1 = nn.Linear(2,16)
            self.x2 = nn.Linear(16,1)

I would suggest getting rid of the Sigmoid. My intuition is that it will
actually hurt here – you have to saturate the Sigmoid (large network
parameters) to match closely your val_expected. Return the result
of your last Linear layer as the output of your network.

Conventional wisdom suggests that BCEWithLogitsLoss would be
a more natural loss function for your problem. You should be able
to get your network to train with MSELoss, but maybe not as easily.

Consider using momentum with SGD (as well as various learning
rates). You might also play around with other optimizers, such as
Adam.

I would first remove the final Sigmoid, use BCEWithLogitsLoss,
and make the network deeper and/or wider.

Good luck.

K. Frank