# Simple pattern is not detected by simple linear model

I have a very simple example of 6 training data with 6 labels. I am trying to fit a simple logistic regression to it, and hoping it should over-fit with 0 loss, but the model does not converge.

Here is the rule, i am hoping the model to learn:

• When input is [1., 0., 0., 1.] , the label is 1
• When input is [0., 1., 1., 0.] , the label is 1
• When input is [1., 0., 1., 0.] , the label is 0
• When input is [0., 1., 0., 1.] , the label is 0

is such pattern something a non-linear model cannot capture ? Any idea what i am doing wrong ?

``````import torch
import torch.nn as nn

training_samples = torch.tensor([[0., 1., 1., 0.],
[1., 0., 0., 1.],
[0., 1., 0., 1.],
[1., 0., 1., 0.]])

labels = torch.tensor([1., 1., 0., 0.]).view(-1, 1)

def normalize(x):
x_normed = (x - x.mean(0, keepdim=True)[0])/(x.std(0, keepdim=True)[0])
return x_normed

class LogisticRegression(torch.nn.Module):
def __init__(self):
super(LogisticRegression, self).__init__()
self.linear1 = torch.nn.Linear(4, 2)
self.linear2 = torch.nn.Linear(2, 2)
self.linear3 = torch.nn.Linear(2, 1)

def forward(self, x):
x = torch.relu(self.linear1(x))
x = torch.relu(self.linear2(x))
x = torch.sigmoid(self.linear3(x))
return x

model = LogisticRegression()
criterion = torch.nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=.01)

model.train()
for epoch in range(10):
shuffle = torch.randperm(len(labels))  # shuffling the training data
labels = labels[shuffle] # shuffle labels
training_samples = training_samples[shuffle]# shuffle features
y_pred = model(normalize(training_samples))    # normalization does not help
loss = criterion(y_pred, labels)
loss.backward()
optimizer.step()
print('Epoch %d | Loss: %.4f' % (epoch, loss.item()))

``````

Hello pemfir!

The short answer is that a variant of your neural network can capture
this pattern if you add a third â€śhidden neuronâ€ť to your first layer.

That is, change:

to something like this:

``````    def __init__(self):
super(LogisticRegression, self).__init__()
self.linear1 = torch.nn.Linear(4, 3)
self.linear2 = torch.nn.Linear(3, 2)
self.linear3 = torch.nn.Linear(2, 1)
...
``````

Note, that to train this â€śthree-neuronâ€ť network far along to a low loss, I
increased the learning rate to `0.1` and the number of epochs to `1000`.

I donâ€™t have a good explanation as to why adding this neuron is
necessary / sufficient, or how to analyze what other minimal tweaks
would work.

Upon inspection, your samples contain redundant information:

`s[i, 1] == s[i, 2]` if, and only if `s[i, 0] == s[i,3]`. And your
labels are `1` when this equality holds.

`l[i] = s[i, 1] * s[i, 2] + (1 - s[i, 1]) * (1 - s[i, 2])`

Neural networks can certainly reproduce (good approximations to)
quadratic functions, but I donâ€™t have a good understanding of how
minimal a neural network can be and still do this.

Itâ€™s interesting to note that if we â€śhelpâ€ť the network by adding a single
product to the input sample, you can train successfully with only two
hidden neurons in the first layer.

So, letâ€™s get rid of the redundant inputs, and add the (mathematically
redundant) helper product:

``````training_samples = torch.tensor([[1., 1., 1.],
[0., 0., 0.],
[1., 0., 0.],
[0., 1., 0.]])
``````

Here is a pytorch version 0.3.0 script that illustrates training with
a third neuron and with the helper product:

``````import torch
torch.__version__

torch.manual_seed (2020)

[1., 1.],
[0., 0.],
[1., 0.],
[0., 1.]
]))

[1., 1., 1.],
[0., 0., 0.],
[1., 0., 0.],
[0., 1., 0.]
]))

labels = torch.autograd.Variable (torch.FloatTensor ([1., 1., 0., 0.]).view(-1, 1))

nInput = 2
nHidden = 2

model_2_2 = torch.nn.Sequential(
torch.nn.Linear (nInput, nHidden),
torch.nn.ReLU(),
torch.nn.Linear (nHidden, 2),
torch.nn.ReLU(),
torch.nn.Linear (2, 1),
torch.nn.Sigmoid()
)

nInput = 3
nHidden = 2

model_3_2 = torch.nn.Sequential(
torch.nn.Linear (nInput, nHidden),
torch.nn.ReLU(),
torch.nn.Linear (nHidden, 2),
torch.nn.ReLU(),
torch.nn.Linear (2, 1),
torch.nn.Sigmoid()
)

nInput = 2
nHidden = 3

model_2_3 = torch.nn.Sequential(
torch.nn.Linear (nInput, nHidden),
torch.nn.ReLU(),
torch.nn.Linear (nHidden, 2),
torch.nn.ReLU(),
torch.nn.Linear (2, 1),
torch.nn.Sigmoid()
)

nInput = 3
nHidden = 3
model_3_3 = torch.nn.Sequential(
torch.nn.Linear (nInput, nHidden),
torch.nn.ReLU(),
torch.nn.Linear (nHidden, 2),
torch.nn.ReLU(),
torch.nn.Linear (2, 1),
torch.nn.Sigmoid()
)

criterion = torch.nn.BCELoss()

optimizer_2_2 = torch.optim.SGD (model_2_2.parameters(), lr=.1)

for epoch in range (1000):
shuffle = torch.randperm (len (labels))  # shuffling the training data
y_pred = model_2_2 (samples[shuffle])
loss = criterion (y_pred, labels[shuffle])
loss.backward()
optimizer_2_2.step()
if  (epoch + 1) % 100 == 0:
print ('Epoch %d | Loss: %.4f' % (epoch, loss.data[0]))
print ('model_2_2 (samples) =', [round (v, 4)  for v in model_2_2 (samples).data.squeeze().tolist()])

optimizer_3_2 = torch.optim.SGD (model_3_2.parameters(), lr=.1)

for epoch in range (1000):
shuffle = torch.randperm (len (labels))  # shuffling the training data
y_pred = model_3_2 (samples_aug[shuffle])
loss = criterion (y_pred, labels[shuffle])
loss.backward()
optimizer_3_2.step()
if  (epoch + 1) % 100 == 0:
print ('Epoch %d | Loss: %.4f' % (epoch, loss.data[0]))
print ('model_3_2 (samples_aug) =', [round (v, 4)  for v in model_3_2 (samples_aug).data.squeeze().tolist()])

optimizer_2_3 = torch.optim.SGD (model_2_3.parameters(), lr=.1)

for epoch in range (1000):
shuffle = torch.randperm (len (labels))  # shuffling the training data
y_pred = model_2_3 (samples[shuffle])
loss = criterion (y_pred, labels[shuffle])
loss.backward()
optimizer_2_3.step()
if  (epoch + 1) % 100 == 0:
print ('Epoch %d | Loss: %.4f' % (epoch, loss.data[0]))
print ('model_2_3 (samples) =', [round (v, 4)  for v in model_2_3 (samples).data.squeeze().tolist()])

optimizer_3_3 = torch.optim.SGD (model_3_3.parameters(), lr=.1)

for epoch in range (1000):
shuffle = torch.randperm (len (labels))  # shuffling the training data
y_pred = model_3_3 (samples_aug[shuffle])
loss = criterion (y_pred, labels[shuffle])
loss.backward()
optimizer_3_3.step()
if  (epoch + 1) % 100 == 0:
print ('Epoch %d | Loss: %.4f' % (epoch, loss.data[0]))
print ('model_3_3 (samples_aug) =', [round (v, 4)  for v in model_3_3 (samples_aug).data.squeeze().tolist()])

``````

And here is the output:

``````>>> import torch
>>> torch.__version__
'0.3.0b0+591e73e'
>>>
>>> torch.manual_seed (2020)
<torch._C.Generator object at 0x00000210DE836630>
>>>
>>> samples = torch.autograd.Variable (torch.FloatTensor ([
...     [1., 1.],
...     [0., 0.],
...     [1., 0.],
...     [0., 1.]
... ]))
>>>
>>> samples_aug = torch.autograd.Variable (torch.FloatTensor ([
...     [1., 1., 1.],
...     [0., 0., 0.],
...     [1., 0., 0.],
...     [0., 1., 0.]
... ]))
>>>
>>> labels = torch.autograd.Variable (torch.FloatTensor ([1., 1., 0., 0.]).view(-1, 1))
>>>
>>> nInput = 2
>>> nHidden = 2
>>>
>>> model_2_2 = torch.nn.Sequential(
...     torch.nn.Linear (nInput, nHidden),
...     torch.nn.ReLU(),
...     torch.nn.Linear (nHidden, 2),
...     torch.nn.ReLU(),
...     torch.nn.Linear (2, 1),
...     torch.nn.Sigmoid()
... )
>>>
>>> nInput = 3
>>> nHidden = 2
>>>
>>> model_3_2 = torch.nn.Sequential(
...     torch.nn.Linear (nInput, nHidden),
...     torch.nn.ReLU(),
...     torch.nn.Linear (nHidden, 2),
...     torch.nn.ReLU(),
...     torch.nn.Linear (2, 1),
...     torch.nn.Sigmoid()
... )
>>>
>>> nInput = 2
>>> nHidden = 3
>>>
>>> model_2_3 = torch.nn.Sequential(
...     torch.nn.Linear (nInput, nHidden),
...     torch.nn.ReLU(),
...     torch.nn.Linear (nHidden, 2),
...     torch.nn.ReLU(),
...     torch.nn.Linear (2, 1),
...     torch.nn.Sigmoid()
... )
>>>
>>> nInput = 3
>>> nHidden = 3
>>> model_3_3 = torch.nn.Sequential(
...     torch.nn.Linear (nInput, nHidden),
...     torch.nn.ReLU(),
...     torch.nn.Linear (nHidden, 2),
...     torch.nn.ReLU(),
...     torch.nn.Linear (2, 1),
...     torch.nn.Sigmoid()
... )
>>>
>>> criterion = torch.nn.BCELoss()
>>>
>>>
>>> optimizer_2_2 = torch.optim.SGD (model_2_2.parameters(), lr=.1)
>>>
>>> for epoch in range (1000):
...     shuffle = torch.randperm (len (labels))  # shuffling the training data
...     y_pred = model_2_2 (samples[shuffle])
...     loss = criterion (y_pred, labels[shuffle])
...     loss.backward()
...     optimizer_2_2.step()
...     if  (epoch + 1) % 100 == 0:
...         print ('Epoch %d | Loss: %.4f' % (epoch, loss.data[0]))
...         print ('model_2_2 (samples) =', [round (v, 4)  for v in model_2_2 (samples).data.squeeze().tolist()])
...
Epoch 99 | Loss: 0.6932
model_2_2 (samples) = [0.5061, 0.5061, 0.5061, 0.5061]
Epoch 199 | Loss: 0.6931
model_2_2 (samples) = [0.5002, 0.5002, 0.5002, 0.5002]
Epoch 299 | Loss: 0.6931
model_2_2 (samples) = [0.5, 0.5, 0.5, 0.5]
Epoch 399 | Loss: 0.6931
model_2_2 (samples) = [0.5, 0.5, 0.5, 0.5]
Epoch 499 | Loss: 0.6931
model_2_2 (samples) = [0.5, 0.5, 0.5, 0.5]
Epoch 599 | Loss: 0.6931
model_2_2 (samples) = [0.5, 0.5, 0.5, 0.5]
Epoch 699 | Loss: 0.6931
model_2_2 (samples) = [0.5, 0.5, 0.5, 0.5]
Epoch 799 | Loss: 0.6931
model_2_2 (samples) = [0.5, 0.5, 0.5, 0.5]
Epoch 899 | Loss: 0.6931
model_2_2 (samples) = [0.5, 0.5, 0.5, 0.5]
Epoch 999 | Loss: 0.6931
model_2_2 (samples) = [0.5, 0.5, 0.5, 0.5]
>>>
>>> optimizer_3_2 = torch.optim.SGD (model_3_2.parameters(), lr=.1)
>>>
>>> for epoch in range (1000):
...     shuffle = torch.randperm (len (labels))  # shuffling the training data
...     y_pred = model_3_2 (samples_aug[shuffle])
...     loss = criterion (y_pred, labels[shuffle])
...     loss.backward()
...     optimizer_3_2.step()
...     if  (epoch + 1) % 100 == 0:
...         print ('Epoch %d | Loss: %.4f' % (epoch, loss.data[0]))
...         print ('model_3_2 (samples_aug) =', [round (v, 4)  for v in model_3_2 (samples_aug).data.squeeze().tolist()])
...
Epoch 99 | Loss: 0.6926
model_3_2 (samples_aug) = [0.5092, 0.5072, 0.5093, 0.5058]
Epoch 199 | Loss: 0.6889
model_3_2 (samples_aug) = [0.5008, 0.4941, 0.5008, 0.4852]
Epoch 299 | Loss: 0.6755
model_3_2 (samples_aug) = [0.5063, 0.4866, 0.5063, 0.448]
Epoch 399 | Loss: 0.6013
model_3_2 (samples_aug) = [0.542, 0.5209, 0.5416, 0.2982]
Epoch 499 | Loss: 0.4251
model_3_2 (samples_aug) = [0.6544, 0.6344, 0.5311, 0.0498]
Epoch 599 | Loss: 0.1344
model_3_2 (samples_aug) = [0.7972, 0.7972, 0.0621, 0.0127]
Epoch 699 | Loss: 0.0678
model_3_2 (samples_aug) = [0.8842, 0.8842, 0.0163, 0.0068]
Epoch 799 | Loss: 0.0442
model_3_2 (samples_aug) = [0.9216, 0.9216, 0.0081, 0.0046]
Epoch 899 | Loss: 0.0325
model_3_2 (samples_aug) = [0.9414, 0.9414, 0.0052, 0.0035]
Epoch 999 | Loss: 0.0255
model_3_2 (samples_aug) = [0.9534, 0.9534, 0.0036, 0.0027]
>>>
>>> optimizer_2_3 = torch.optim.SGD (model_2_3.parameters(), lr=.1)
>>>
>>> for epoch in range (1000):
...     shuffle = torch.randperm (len (labels))  # shuffling the training data
...     y_pred = model_2_3 (samples[shuffle])
...     loss = criterion (y_pred, labels[shuffle])
...     loss.backward()
...     optimizer_2_3.step()
...     if  (epoch + 1) % 100 == 0:
...         print ('Epoch %d | Loss: %.4f' % (epoch, loss.data[0]))
...         print ('model_2_3 (samples) =', [round (v, 4)  for v in model_2_3 (samples).data.squeeze().tolist()])
...
Epoch 99 | Loss: 0.6537
model_2_3 (samples) = [0.538, 0.5671, 0.4874, 0.53]
Epoch 199 | Loss: 0.5327
model_2_3 (samples) = [0.6096, 0.67, 0.3885, 0.529]
Epoch 299 | Loss: 0.2323
model_2_3 (samples) = [0.8878, 0.8619, 0.2533, 0.3082]
Epoch 399 | Loss: 0.0967
model_2_3 (samples) = [0.9735, 0.9617, 0.145, 0.1497]
Epoch 499 | Loss: 0.0577
model_2_3 (samples) = [0.9885, 0.9832, 0.0957, 0.0955]
Epoch 599 | Loss: 0.0408
model_2_3 (samples) = [0.9925, 0.9894, 0.0694, 0.0694]
Epoch 699 | Loss: 0.0308
model_2_3 (samples) = [0.9951, 0.993, 0.0547, 0.054]
Epoch 799 | Loss: 0.0249
model_2_3 (samples) = [0.996, 0.9945, 0.0439, 0.0439]
Epoch 899 | Loss: 0.0210
model_2_3 (samples) = [0.9964, 0.9953, 0.037, 0.037]
Epoch 999 | Loss: 0.0178
model_2_3 (samples) = [0.9973, 0.9964, 0.0318, 0.0318]
>>>
>>> optimizer_3_3 = torch.optim.SGD (model_3_3.parameters(), lr=.1)
>>>
>>> for epoch in range (1000):
...     shuffle = torch.randperm (len (labels))  # shuffling the training data
...     y_pred = model_3_3 (samples_aug[shuffle])
...     loss = criterion (y_pred, labels[shuffle])
...     loss.backward()
...     optimizer_3_3.step()
...     if  (epoch + 1) % 100 == 0:
...         print ('Epoch %d | Loss: %.4f' % (epoch, loss.data[0]))
...         print ('model_3_3 (samples_aug) =', [round (v, 4)  for v in model_3_3 (samples_aug).data.squeeze().tolist()])
...
Epoch 99 | Loss: 0.6778
model_3_3 (samples_aug) = [0.5113, 0.4968, 0.5173, 0.4568]
Epoch 199 | Loss: 0.5863
model_3_3 (samples_aug) = [0.5727, 0.5431, 0.5767, 0.2665]
Epoch 299 | Loss: 0.4890
model_3_3 (samples_aug) = [0.6795, 0.6138, 0.6359, 0.0656]
Epoch 399 | Loss: 0.1760
model_3_3 (samples_aug) = [0.9167, 0.802, 0.2872, 0.0332]
Epoch 499 | Loss: 0.0286
model_3_3 (samples_aug) = [0.9821, 0.965, 0.0446, 0.0129]
Epoch 599 | Loss: 0.0114
model_3_3 (samples_aug) = [0.9931, 0.9846, 0.0166, 0.0063]
Epoch 699 | Loss: 0.0066
model_3_3 (samples_aug) = [0.9963, 0.9905, 0.0092, 0.0037]
Epoch 799 | Loss: 0.0044
model_3_3 (samples_aug) = [0.9977, 0.9933, 0.0061, 0.0025]
Epoch 899 | Loss: 0.0033
model_3_3 (samples_aug) = [0.9984, 0.9949, 0.0044, 0.0019]
Epoch 999 | Loss: 0.0026
model_3_3 (samples_aug) = [0.9988, 0.9959, 0.0034, 0.0015]
``````

Good luck.

K. Frank

1 Like

Thank you for your response. I realized i needed to work on the optimization parameters. For example choosing between Least squared loss or cross entropy, ADAM optimizer seems to be quite better than SGD, and that helped me a lot.