I have two models, the first is a basic perception: -
## Perceptron
class Model (nn.Module):
def __init__(self, input_size, output_size):
super().__init__()
self.linear = nn.Linear(input_size, output_size)
def forward(self, x):
x = torch.sigmoid(self.linear(x))
return(x)
def predict(self, x):
pred = self.forward(x)
if pred >= 0.5:
return 1
else:
return 0
This model reaches 100% accuracy on the train data after ~1000 epochs and ~87% on the test dataset. I then tried a model with an additional layer: -
class Model (nn.Module):
def __init__(self, input_size, H1, output_size):
super().__init__()
self.linear = nn.Linear(input_size, H1)
self.linear2 = nn.Linear(H1, output_size)
def forward(self, x):
x = torch.sigmoid(self.linear(x))
x = torch.sigmoid(self.linear2(x))
return(x)
def predict(self, x):
pred = self.forward(x)
if pred >= 0.5:
return 1
else:
return 0
Even across a wide range of sensible values for H1 (e.g. 5, 50, 500), this model reaches an accuracy of ~23% after 100-200 epochs and remains there after 2000 epochs.
Have I made a blunder here?
Edit: On further inspection, the 2nd model outputs the same class prediction for each input, which is the majority class (23% of the data).
Hi,
I craft a toy problem to test your model, although I can’t see anything wrong with it.
I made some small changes in your code, so it’s easier for me to compute error.
Here is the code
import torch
import torch.nn as nn
class Model(nn.Module):
def __init__(self, input_size, H1, output_size):
super().__init__()
self.linear = nn.Linear(input_size, H1)
self.linear2 = nn.Linear(H1, output_size)
def forward(self, x):
x = torch.sigmoid(self.linear(x))
x = torch.sigmoid(self.linear2(x))
return(x)
@torch.no_grad()
def predict(self, x):
pred = self.forward(x)
return torch.where(pred>.5,1,0)
def get_toyData(N1,N2):
labels = torch.cat([torch.ones(N1),torch.zeros(N2)])
data = torch.cat([torch.randn(N1,2),torch.randn(N2,2)+1.5])
return data , labels
N1 , N2 = 1000,300
train_data , train_labels = get_toyData(N1,N2)
dataset = torch.utils.data.TensorDataset(train_data,train_labels)
dl = torch.utils.data.DataLoader(dataset,batch_size=20,shuffle=True)
model = Model(input_size=2, H1=20, output_size=1)
criteria = nn.BCELoss()
optim = torch.optim.SGD(model.parameters(),lr=.1,momentum=.9)
for i in range(100):
totalLoss = 0
train_error = 0
for x,t in dl:
optim.zero_grad()
out=model(x)
loss = criteria(out.squeeze(),t)
loss.backward()
optim.step()
totalLoss += loss.item()
pred = model.predict(x)
error = sum(pred.squeeze()!=t)
train_error += error
print(i,totalLoss,train_error/1300)
it works okey. i see no problem.
maybe you have a bug in generating your dataset.
Thank you for the reply. Your code was v useful and helped me to identify the lr/learning rate parameter as the issue. Interestingly, a several order of magnitude decrease in lr was needed to achieve good results when using a multi-layer vs single-layer model! All good now, thanks again!