Hello pemfir!
pemfir:
Here is the rule, i am hoping the model to learn:
When input is [1., 0., 0., 1.] , the label is 1
When input is [0., 1., 1., 0.] , the label is 1
When input is [1., 0., 1., 0.] , the label is 0
When input is [0., 1., 0., 1.] , the label is 0
is such pattern something a non-linear model cannot capture ?
The short answer is that a variant of your neural network can capture
this pattern if you add a third “hidden neuron” to your first layer.
That is, change:
class LogisticRegression(torch.nn.Module):
def __init__(self):
super(LogisticRegression, self).__init__()
self.linear1 = torch.nn.Linear(4, 2)
self.linear2 = torch.nn.Linear(2, 2)
self.linear3 = torch.nn.Linear(2, 1)
...
to something like this:
def __init__(self):
super(LogisticRegression, self).__init__()
self.linear1 = torch.nn.Linear(4, 3)
self.linear2 = torch.nn.Linear(3, 2)
self.linear3 = torch.nn.Linear(2, 1)
...
Note, that to train this “three-neuron” network far along to a low loss, I
increased the learning rate to 0.1
and the number of epochs to 1000
.
Some further comments:
I don’t have a good explanation as to why adding this neuron is
necessary / sufficient, or how to analyze what other minimal tweaks
would work.
Upon inspection, your samples contain redundant information:
s[i, 1] == s[i, 2]
if, and only if s[i, 0] == s[i,3]
. And your
labels are 1
when this equality holds.
We can write a simple (quadratic) formula for your labels:
l[i] = s[i, 1] * s[i, 2] + (1 - s[i, 1]) * (1 - s[i, 2])
Neural networks can certainly reproduce (good approximations to)
quadratic functions, but I don’t have a good understanding of how
minimal a neural network can be and still do this.
It’s interesting to note that if we “help” the network by adding a single
product to the input sample, you can train successfully with only two
hidden neurons in the first layer.
So, let’s get rid of the redundant inputs, and add the (mathematically
redundant) helper product:
training_samples = torch.tensor([[1., 1., 1.],
[0., 0., 0.],
[1., 0., 0.],
[0., 1., 0.]])
Here is a pytorch version 0.3.0 script that illustrates training with
a third neuron and with the helper product:
import torch
torch.__version__
torch.manual_seed (2020)
samples = torch.autograd.Variable (torch.FloatTensor ([
[1., 1.],
[0., 0.],
[1., 0.],
[0., 1.]
]))
samples_aug = torch.autograd.Variable (torch.FloatTensor ([
[1., 1., 1.],
[0., 0., 0.],
[1., 0., 0.],
[0., 1., 0.]
]))
labels = torch.autograd.Variable (torch.FloatTensor ([1., 1., 0., 0.]).view(-1, 1))
nInput = 2
nHidden = 2
model_2_2 = torch.nn.Sequential(
torch.nn.Linear (nInput, nHidden),
torch.nn.ReLU(),
torch.nn.Linear (nHidden, 2),
torch.nn.ReLU(),
torch.nn.Linear (2, 1),
torch.nn.Sigmoid()
)
nInput = 3
nHidden = 2
model_3_2 = torch.nn.Sequential(
torch.nn.Linear (nInput, nHidden),
torch.nn.ReLU(),
torch.nn.Linear (nHidden, 2),
torch.nn.ReLU(),
torch.nn.Linear (2, 1),
torch.nn.Sigmoid()
)
nInput = 2
nHidden = 3
model_2_3 = torch.nn.Sequential(
torch.nn.Linear (nInput, nHidden),
torch.nn.ReLU(),
torch.nn.Linear (nHidden, 2),
torch.nn.ReLU(),
torch.nn.Linear (2, 1),
torch.nn.Sigmoid()
)
nInput = 3
nHidden = 3
model_3_3 = torch.nn.Sequential(
torch.nn.Linear (nInput, nHidden),
torch.nn.ReLU(),
torch.nn.Linear (nHidden, 2),
torch.nn.ReLU(),
torch.nn.Linear (2, 1),
torch.nn.Sigmoid()
)
criterion = torch.nn.BCELoss()
optimizer_2_2 = torch.optim.SGD (model_2_2.parameters(), lr=.1)
for epoch in range (1000):
shuffle = torch.randperm (len (labels)) # shuffling the training data
y_pred = model_2_2 (samples[shuffle])
loss = criterion (y_pred, labels[shuffle])
optimizer_2_2.zero_grad()
loss.backward()
optimizer_2_2.step()
if (epoch + 1) % 100 == 0:
print ('Epoch %d | Loss: %.4f' % (epoch, loss.data[0]))
print ('model_2_2 (samples) =', [round (v, 4) for v in model_2_2 (samples).data.squeeze().tolist()])
optimizer_3_2 = torch.optim.SGD (model_3_2.parameters(), lr=.1)
for epoch in range (1000):
shuffle = torch.randperm (len (labels)) # shuffling the training data
y_pred = model_3_2 (samples_aug[shuffle])
loss = criterion (y_pred, labels[shuffle])
optimizer_3_2.zero_grad()
loss.backward()
optimizer_3_2.step()
if (epoch + 1) % 100 == 0:
print ('Epoch %d | Loss: %.4f' % (epoch, loss.data[0]))
print ('model_3_2 (samples_aug) =', [round (v, 4) for v in model_3_2 (samples_aug).data.squeeze().tolist()])
optimizer_2_3 = torch.optim.SGD (model_2_3.parameters(), lr=.1)
for epoch in range (1000):
shuffle = torch.randperm (len (labels)) # shuffling the training data
y_pred = model_2_3 (samples[shuffle])
loss = criterion (y_pred, labels[shuffle])
optimizer_2_3.zero_grad()
loss.backward()
optimizer_2_3.step()
if (epoch + 1) % 100 == 0:
print ('Epoch %d | Loss: %.4f' % (epoch, loss.data[0]))
print ('model_2_3 (samples) =', [round (v, 4) for v in model_2_3 (samples).data.squeeze().tolist()])
optimizer_3_3 = torch.optim.SGD (model_3_3.parameters(), lr=.1)
for epoch in range (1000):
shuffle = torch.randperm (len (labels)) # shuffling the training data
y_pred = model_3_3 (samples_aug[shuffle])
loss = criterion (y_pred, labels[shuffle])
optimizer_3_3.zero_grad()
loss.backward()
optimizer_3_3.step()
if (epoch + 1) % 100 == 0:
print ('Epoch %d | Loss: %.4f' % (epoch, loss.data[0]))
print ('model_3_3 (samples_aug) =', [round (v, 4) for v in model_3_3 (samples_aug).data.squeeze().tolist()])
And here is the output:
>>> import torch
>>> torch.__version__
'0.3.0b0+591e73e'
>>>
>>> torch.manual_seed (2020)
<torch._C.Generator object at 0x00000210DE836630>
>>>
>>> samples = torch.autograd.Variable (torch.FloatTensor ([
... [1., 1.],
... [0., 0.],
... [1., 0.],
... [0., 1.]
... ]))
>>>
>>> samples_aug = torch.autograd.Variable (torch.FloatTensor ([
... [1., 1., 1.],
... [0., 0., 0.],
... [1., 0., 0.],
... [0., 1., 0.]
... ]))
>>>
>>> labels = torch.autograd.Variable (torch.FloatTensor ([1., 1., 0., 0.]).view(-1, 1))
>>>
>>> nInput = 2
>>> nHidden = 2
>>>
>>> model_2_2 = torch.nn.Sequential(
... torch.nn.Linear (nInput, nHidden),
... torch.nn.ReLU(),
... torch.nn.Linear (nHidden, 2),
... torch.nn.ReLU(),
... torch.nn.Linear (2, 1),
... torch.nn.Sigmoid()
... )
>>>
>>> nInput = 3
>>> nHidden = 2
>>>
>>> model_3_2 = torch.nn.Sequential(
... torch.nn.Linear (nInput, nHidden),
... torch.nn.ReLU(),
... torch.nn.Linear (nHidden, 2),
... torch.nn.ReLU(),
... torch.nn.Linear (2, 1),
... torch.nn.Sigmoid()
... )
>>>
>>> nInput = 2
>>> nHidden = 3
>>>
>>> model_2_3 = torch.nn.Sequential(
... torch.nn.Linear (nInput, nHidden),
... torch.nn.ReLU(),
... torch.nn.Linear (nHidden, 2),
... torch.nn.ReLU(),
... torch.nn.Linear (2, 1),
... torch.nn.Sigmoid()
... )
>>>
>>> nInput = 3
>>> nHidden = 3
>>> model_3_3 = torch.nn.Sequential(
... torch.nn.Linear (nInput, nHidden),
... torch.nn.ReLU(),
... torch.nn.Linear (nHidden, 2),
... torch.nn.ReLU(),
... torch.nn.Linear (2, 1),
... torch.nn.Sigmoid()
... )
>>>
>>> criterion = torch.nn.BCELoss()
>>>
>>>
>>> optimizer_2_2 = torch.optim.SGD (model_2_2.parameters(), lr=.1)
>>>
>>> for epoch in range (1000):
... shuffle = torch.randperm (len (labels)) # shuffling the training data
... y_pred = model_2_2 (samples[shuffle])
... loss = criterion (y_pred, labels[shuffle])
... optimizer_2_2.zero_grad()
... loss.backward()
... optimizer_2_2.step()
... if (epoch + 1) % 100 == 0:
... print ('Epoch %d | Loss: %.4f' % (epoch, loss.data[0]))
... print ('model_2_2 (samples) =', [round (v, 4) for v in model_2_2 (samples).data.squeeze().tolist()])
...
Epoch 99 | Loss: 0.6932
model_2_2 (samples) = [0.5061, 0.5061, 0.5061, 0.5061]
Epoch 199 | Loss: 0.6931
model_2_2 (samples) = [0.5002, 0.5002, 0.5002, 0.5002]
Epoch 299 | Loss: 0.6931
model_2_2 (samples) = [0.5, 0.5, 0.5, 0.5]
Epoch 399 | Loss: 0.6931
model_2_2 (samples) = [0.5, 0.5, 0.5, 0.5]
Epoch 499 | Loss: 0.6931
model_2_2 (samples) = [0.5, 0.5, 0.5, 0.5]
Epoch 599 | Loss: 0.6931
model_2_2 (samples) = [0.5, 0.5, 0.5, 0.5]
Epoch 699 | Loss: 0.6931
model_2_2 (samples) = [0.5, 0.5, 0.5, 0.5]
Epoch 799 | Loss: 0.6931
model_2_2 (samples) = [0.5, 0.5, 0.5, 0.5]
Epoch 899 | Loss: 0.6931
model_2_2 (samples) = [0.5, 0.5, 0.5, 0.5]
Epoch 999 | Loss: 0.6931
model_2_2 (samples) = [0.5, 0.5, 0.5, 0.5]
>>>
>>> optimizer_3_2 = torch.optim.SGD (model_3_2.parameters(), lr=.1)
>>>
>>> for epoch in range (1000):
... shuffle = torch.randperm (len (labels)) # shuffling the training data
... y_pred = model_3_2 (samples_aug[shuffle])
... loss = criterion (y_pred, labels[shuffle])
... optimizer_3_2.zero_grad()
... loss.backward()
... optimizer_3_2.step()
... if (epoch + 1) % 100 == 0:
... print ('Epoch %d | Loss: %.4f' % (epoch, loss.data[0]))
... print ('model_3_2 (samples_aug) =', [round (v, 4) for v in model_3_2 (samples_aug).data.squeeze().tolist()])
...
Epoch 99 | Loss: 0.6926
model_3_2 (samples_aug) = [0.5092, 0.5072, 0.5093, 0.5058]
Epoch 199 | Loss: 0.6889
model_3_2 (samples_aug) = [0.5008, 0.4941, 0.5008, 0.4852]
Epoch 299 | Loss: 0.6755
model_3_2 (samples_aug) = [0.5063, 0.4866, 0.5063, 0.448]
Epoch 399 | Loss: 0.6013
model_3_2 (samples_aug) = [0.542, 0.5209, 0.5416, 0.2982]
Epoch 499 | Loss: 0.4251
model_3_2 (samples_aug) = [0.6544, 0.6344, 0.5311, 0.0498]
Epoch 599 | Loss: 0.1344
model_3_2 (samples_aug) = [0.7972, 0.7972, 0.0621, 0.0127]
Epoch 699 | Loss: 0.0678
model_3_2 (samples_aug) = [0.8842, 0.8842, 0.0163, 0.0068]
Epoch 799 | Loss: 0.0442
model_3_2 (samples_aug) = [0.9216, 0.9216, 0.0081, 0.0046]
Epoch 899 | Loss: 0.0325
model_3_2 (samples_aug) = [0.9414, 0.9414, 0.0052, 0.0035]
Epoch 999 | Loss: 0.0255
model_3_2 (samples_aug) = [0.9534, 0.9534, 0.0036, 0.0027]
>>>
>>> optimizer_2_3 = torch.optim.SGD (model_2_3.parameters(), lr=.1)
>>>
>>> for epoch in range (1000):
... shuffle = torch.randperm (len (labels)) # shuffling the training data
... y_pred = model_2_3 (samples[shuffle])
... loss = criterion (y_pred, labels[shuffle])
... optimizer_2_3.zero_grad()
... loss.backward()
... optimizer_2_3.step()
... if (epoch + 1) % 100 == 0:
... print ('Epoch %d | Loss: %.4f' % (epoch, loss.data[0]))
... print ('model_2_3 (samples) =', [round (v, 4) for v in model_2_3 (samples).data.squeeze().tolist()])
...
Epoch 99 | Loss: 0.6537
model_2_3 (samples) = [0.538, 0.5671, 0.4874, 0.53]
Epoch 199 | Loss: 0.5327
model_2_3 (samples) = [0.6096, 0.67, 0.3885, 0.529]
Epoch 299 | Loss: 0.2323
model_2_3 (samples) = [0.8878, 0.8619, 0.2533, 0.3082]
Epoch 399 | Loss: 0.0967
model_2_3 (samples) = [0.9735, 0.9617, 0.145, 0.1497]
Epoch 499 | Loss: 0.0577
model_2_3 (samples) = [0.9885, 0.9832, 0.0957, 0.0955]
Epoch 599 | Loss: 0.0408
model_2_3 (samples) = [0.9925, 0.9894, 0.0694, 0.0694]
Epoch 699 | Loss: 0.0308
model_2_3 (samples) = [0.9951, 0.993, 0.0547, 0.054]
Epoch 799 | Loss: 0.0249
model_2_3 (samples) = [0.996, 0.9945, 0.0439, 0.0439]
Epoch 899 | Loss: 0.0210
model_2_3 (samples) = [0.9964, 0.9953, 0.037, 0.037]
Epoch 999 | Loss: 0.0178
model_2_3 (samples) = [0.9973, 0.9964, 0.0318, 0.0318]
>>>
>>> optimizer_3_3 = torch.optim.SGD (model_3_3.parameters(), lr=.1)
>>>
>>> for epoch in range (1000):
... shuffle = torch.randperm (len (labels)) # shuffling the training data
... y_pred = model_3_3 (samples_aug[shuffle])
... loss = criterion (y_pred, labels[shuffle])
... optimizer_3_3.zero_grad()
... loss.backward()
... optimizer_3_3.step()
... if (epoch + 1) % 100 == 0:
... print ('Epoch %d | Loss: %.4f' % (epoch, loss.data[0]))
... print ('model_3_3 (samples_aug) =', [round (v, 4) for v in model_3_3 (samples_aug).data.squeeze().tolist()])
...
Epoch 99 | Loss: 0.6778
model_3_3 (samples_aug) = [0.5113, 0.4968, 0.5173, 0.4568]
Epoch 199 | Loss: 0.5863
model_3_3 (samples_aug) = [0.5727, 0.5431, 0.5767, 0.2665]
Epoch 299 | Loss: 0.4890
model_3_3 (samples_aug) = [0.6795, 0.6138, 0.6359, 0.0656]
Epoch 399 | Loss: 0.1760
model_3_3 (samples_aug) = [0.9167, 0.802, 0.2872, 0.0332]
Epoch 499 | Loss: 0.0286
model_3_3 (samples_aug) = [0.9821, 0.965, 0.0446, 0.0129]
Epoch 599 | Loss: 0.0114
model_3_3 (samples_aug) = [0.9931, 0.9846, 0.0166, 0.0063]
Epoch 699 | Loss: 0.0066
model_3_3 (samples_aug) = [0.9963, 0.9905, 0.0092, 0.0037]
Epoch 799 | Loss: 0.0044
model_3_3 (samples_aug) = [0.9977, 0.9933, 0.0061, 0.0025]
Epoch 899 | Loss: 0.0033
model_3_3 (samples_aug) = [0.9984, 0.9949, 0.0044, 0.0019]
Epoch 999 | Loss: 0.0026
model_3_3 (samples_aug) = [0.9988, 0.9959, 0.0034, 0.0015]
Good luck.
K. Frank