Beginner Q, why does this model not work?

ckvero · December 14, 2021, 3:52pm

I have two models, the first is a basic perception: -

## Perceptron

class Model (nn.Module):
    def __init__(self, input_size, output_size):
        super().__init__()
        self.linear = nn.Linear(input_size, output_size)
        
    def forward(self, x):
        x = torch.sigmoid(self.linear(x))
        return(x)
    
    def predict(self, x):
      pred = self.forward(x)
      if pred >= 0.5:
        return 1
      else: 
        return 0

This model reaches 100% accuracy on the train data after ~1000 epochs and ~87% on the test dataset. I then tried a model with an additional layer: -

class Model (nn.Module):
    def __init__(self, input_size, H1, output_size):
        super().__init__()
        self.linear = nn.Linear(input_size, H1)
        self.linear2 = nn.Linear(H1, output_size)
        
    def forward(self, x):
        x = torch.sigmoid(self.linear(x))
        x = torch.sigmoid(self.linear2(x))
        return(x)
    
    def predict(self, x):
      pred = self.forward(x)
      if pred >= 0.5:
        return 1
      else: 
        return 0

Even across a wide range of sensible values for H1 (e.g. 5, 50, 500), this model reaches an accuracy of ~23% after 100-200 epochs and remains there after 2000 epochs.

Have I made a blunder here?

Edit: On further inspection, the 2nd model outputs the same class prediction for each input, which is the majority class (23% of the data).

ckvero · December 14, 2021, 9:01pm

Any suggestions? What would be a good way to troubleshoot this?

ckvero · December 15, 2021, 9:55am

Is there another forum/discussion group for beginners questions, please?

mMagmer · December 15, 2021, 11:18am

Hi,
I craft a toy problem to test your model, although I can’t see anything wrong with it.
I made some small changes in your code, so it’s easier for me to compute error.
Here is the code

import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self, input_size, H1, output_size):
        super().__init__()
        self.linear = nn.Linear(input_size, H1)
        self.linear2 = nn.Linear(H1, output_size)
        
    def forward(self, x):
        x = torch.sigmoid(self.linear(x))
        x = torch.sigmoid(self.linear2(x))
        return(x)
    
    @torch.no_grad()
    def predict(self, x):
        pred = self.forward(x)
        
        return torch.where(pred>.5,1,0)




def get_toyData(N1,N2):
    labels = torch.cat([torch.ones(N1),torch.zeros(N2)])
    data = torch.cat([torch.randn(N1,2),torch.randn(N2,2)+1.5])
    return data , labels 




N1 , N2 = 1000,300 
train_data , train_labels = get_toyData(N1,N2)



dataset = torch.utils.data.TensorDataset(train_data,train_labels)




dl = torch.utils.data.DataLoader(dataset,batch_size=20,shuffle=True)


model = Model(input_size=2, H1=20, output_size=1)
criteria = nn.BCELoss()
optim =  torch.optim.SGD(model.parameters(),lr=.1,momentum=.9)



for i in range(100):
    totalLoss = 0
    train_error = 0
    for x,t in dl:
        optim.zero_grad()
        out=model(x)
        loss = criteria(out.squeeze(),t)
        loss.backward()
        optim.step()
        totalLoss += loss.item()
        pred = model.predict(x)
        error = sum(pred.squeeze()!=t)
        train_error += error
    print(i,totalLoss,train_error/1300)

it works okey. i see no problem.
maybe you have a bug in generating your dataset.

ckvero · December 15, 2021, 11:53am

Thank you for the reply. Your code was v useful and helped me to identify the lr/learning rate parameter as the issue. Interestingly, a several order of magnitude decrease in lr was needed to achieve good results when using a multi-layer vs single-layer model! All good now, thanks again!