# Simple Linear/Sigmoid Network not learning

I’m learning pytorch and tried to train a network as an XOR gate. Everything runs smoothly, but it just does not learn. It does changes it weights, yet it converges in a result for every input that is way out of the expected results.

I have tried with many learning rates and weights initialization.

So the inputs are A and B gates and it should return 1 if both are equals or 0 otherwise, like this :

```    [0,0] => 1
[0,1] => 0
[1,0] => 0
[1,1] => 1

```

This is my attempt of modeling and training the model:

```    import torch as torch
import torch.nn as nn

class Network(nn.Module):

def __init__(self):
super(Network, self).__init__()
self.x1 = nn.Linear(2,4)
self.s1 = nn.Sigmoid()
self.x2 = nn.Linear(4,1)
self.s2 = nn.Sigmoid()

def init(self):
nn.init.uniform_(self.x1.weight)
nn.init.uniform_(self.x2.weight)

def forward(self, feats):
f1 = torch.tensor(feats).float()
xr1= self.x1(f1)
xs1= self.s1(xr1)
xr2= self.x2(xs1)
out= self.s2(xr2)
return out

def train(self,val_expected,feats_next):
val_expected_tensor = torch.tensor(val_expected)
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(self.parameters(), lr=0.01)
def closure():
resp = self.forward(feats_next)
error = criterion(resp,val_expected_tensor)
error.backward()
return error
optimizer.step(closure)

net = Network()
net.init()

for input in ([0.,0.],[0.,1.],[1.,0.],[1.,1.]):
response=net.forward(input)
print(response)

print ("--TRAIN START-")
for i in range(1000):
net.train([1.],[0.,0.])
net.train([0.],[1.,0.])
net.train([0.],[0.,1.])
net.train([1.],[1.,1.])
print ("---TRAIN END---")

for input in ([0.,0.],[0.,1.],[1.,0.],[1.,1.]):
response=net.forward(input)
print(response)

```

This is a run with 100000 iterations at 0.001 learning rate:

```    tensor([0.7726], grad_fn=)
--TRAIN START-
*.........*.........*.........*.........*.........*.........*.........*.........*.........*.........
---TRAIN END---

```

I’m really lost here. Shound’t this work?

Hello Samy!

I think it is rather difficult to train a small network to reproduce an XOR
gate.

XOR has cross-terms that are, in some sense, highly non-linear. This
is an imperfect fit for a “narrow,” “shallow” network.

Continue to experiment with learning rates and many random
initializations, and continue to train for many iterations. It looks
like it is easy for the training of this kind of network to get stuck
away from a good solution.

more hidden layers, e.g., something like:

``````            self.x1 = nn.Linear(2,4)
self.x2 = nn.Linear(4,2)
self.x3 = nn.Linear(2,1)
``````

or:

``````            self.x1 = nn.Linear(2,4)
self.x2 = nn.Linear(4,4)
self.x3 = nn.Linear(4,2)
self.x4 = nn.Linear(2,1)
``````

``````            self.x1 = nn.Linear(2,16)
self.x2 = nn.Linear(16,1)
``````

I would suggest getting rid of the `Sigmoid`. My intuition is that it will
actually hurt here – you have to saturate the `Sigmoid` (large network
parameters) to match closely your `val_expected`. Return the result
of your last `Linear` layer as the output of your network.

Conventional wisdom suggests that `BCEWithLogitsLoss` would be
a more natural loss function for your problem. You should be able
to get your network to train with `MSELoss`, but maybe not as easily.

Consider using `momentum` with `SGD` (as well as various learning
rates). You might also play around with other optimizers, such as
`Adam`.

I would first remove the final `Sigmoid`, use `BCEWithLogitsLoss`,
and make the network deeper and/or wider.

Good luck.

K. Frank

1 Like