Simple Linear/Sigmoid Network not learning

SamyGarib · October 8, 2020, 7:52pm

I’m learning pytorch and tried to train a network as an XOR gate. Everything runs smoothly, but it just does not learn. It does changes it weights, yet it converges in a result for every input that is way out of the expected results.

I have tried with many learning rates and weights initialization.

So the inputs are A and B gates and it should return 1 if both are equals or 0 otherwise, like this :

    [0,0] => 1
    [0,1] => 0
    [1,0] => 0
    [1,1] => 1

This is my attempt of modeling and training the model:

    import torch as torch
    import torch.nn as nn
    
    class Network(nn.Module):
        
        def __init__(self):
            super(Network, self).__init__()
            self.x1 = nn.Linear(2,4)
            self.s1 = nn.Sigmoid()
            self.x2 = nn.Linear(4,1)
            self.s2 = nn.Sigmoid()
        
        def init(self):
            nn.init.uniform_(self.x1.weight)
            nn.init.uniform_(self.x2.weight)
    
        def forward(self, feats):
            f1 = torch.tensor(feats).float()
            xr1= self.x1(f1)
            xs1= self.s1(xr1)
            xr2= self.x2(xs1)
            out= self.s2(xr2)        
            return out  
    
        def train(self,val_expected,feats_next):
            val_expected_tensor = torch.tensor(val_expected)
            criterion = nn.MSELoss()
            optimizer = torch.optim.SGD(self.parameters(), lr=0.01)
            def closure():
                optimizer.zero_grad()
                resp = self.forward(feats_next)
                error = criterion(resp,val_expected_tensor)
                error.backward()
                return error
            optimizer.step(closure)
    
    net = Network()
    net.init()
    
    for input in ([0.,0.],[0.,1.],[1.,0.],[1.,1.]):
        response=net.forward(input)
        print(response)
    
    print ("--TRAIN START-")
    for i in range(1000):
        net.train([1.],[0.,0.])
        net.train([0.],[1.,0.])
        net.train([0.],[0.,1.])
        net.train([1.],[1.,1.])
    print ("---TRAIN END---")
    
    for input in ([0.,0.],[0.,1.],[1.,0.],[1.,1.]):
        response=net.forward(input)
        print(response)

This is a run with 100000 iterations at 0.001 learning rate:

    tensor([0.7726], grad_fn=)
    tensor([0.7954], grad_fn=)
    tensor([0.8229], grad_fn=)
    tensor([0.8410], grad_fn=)
    --TRAIN START-
    *.........*.........*.........*.........*.........*.........*.........*.........*.........*.........
    ---TRAIN END---
    tensor([0.6311], grad_fn=)
    tensor([0.6459], grad_fn=)
    tensor([0.6770], grad_fn=)
    tensor([0.6906], grad_fn=)

I’m really lost here. Shound’t this work?

KFrank · October 9, 2020, 12:15am

Hello Samy!

I think it is rather difficult to train a small network to reproduce an XOR
gate.

XOR has cross-terms that are, in some sense, highly non-linear. This
is an imperfect fit for a “narrow,” “shallow” network.

Please take a look at this thread about a nearly identical problem:

Some further comments in line:

Continue to experiment with learning rates and many random
initializations, and continue to train for many iterations. It looks
like it is easy for the training of this kind of network to get stuck
away from a good solution.

Play around with making your network deeper by adding one or
more hidden layers, e.g., something like:

            self.x1 = nn.Linear(2,4)
            self.x2 = nn.Linear(4,2)
            self.x3 = nn.Linear(2,1)

or:

            self.x1 = nn.Linear(2,4)
            self.x2 = nn.Linear(4,4)
            self.x3 = nn.Linear(4,2)
            self.x4 = nn.Linear(2,1)

Try making your network wider:

            self.x1 = nn.Linear(2,16)
            self.x2 = nn.Linear(16,1)

I would suggest getting rid of the Sigmoid. My intuition is that it will
actually hurt here – you have to saturate the Sigmoid (large network
parameters) to match closely your val_expected. Return the result
of your last Linear layer as the output of your network.

Conventional wisdom suggests that BCEWithLogitsLoss would be
a more natural loss function for your problem. You should be able
to get your network to train with MSELoss, but maybe not as easily.

Consider using momentum with SGD (as well as various learning
rates). You might also play around with other optimizers, such as
Adam.

I would first remove the final Sigmoid, use BCEWithLogitsLoss,
and make the network deeper and/or wider.

Good luck.

K. Frank