Training Makes Model Predict Everything as the Same Value

Rishav_Sen · December 22, 2021, 8:28pm

When I train my model and test it on the test data set, it returns all of the same values. The network is meant to classify objects with 2 float parameters into one of 9 classes. The data is imbalanced so I used a WeightedRandomSampler.

My neural network:

class Net(nn.Module):
    def __init__(self, D_in, H1, H2, D_out):
        super(Net, self).__init__()
        self.linear1 = nn.Linear(D_in, H1)
        self.linear2 = nn.Linear(H1, H2)
        self.linear4 = nn.Linear(H2, D_out)

    def forward(self, x):
        x = f.relu(self.linear1(x))
        x = f.relu(self.linear2(x))
        x = self.linear4(x)
        return x

What it returns when not trained and inputted test data:

tensor([[-300.7366,  142.0265, 1203.9167,  ..., -366.3187,  930.1630,
         -460.1914],
        [-118.0432,   55.4708,  467.9875,  ..., -142.0467,  360.8930,
         -178.8175],
        [-165.0909,   77.8327,  658.5565,  ..., -200.1997,  508.4938,
         -251.7095],
        ...,
        [-166.6815,   78.6092,  665.3017,  ..., -202.2803,  513.7719,
         -254.2981],
        [-197.1317,   93.1154,  789.1254,  ..., -240.1015,  609.7628,
         -301.6740],
        [-130.0121,   61.4217,  520.3043,  ..., -158.2938,  402.0972,
         -198.9381]], grad_fn=<AddmmBackward0>)

Training:

for epoch in range(n_epochs):
    model.train(True)
    for i, xy in enumerate(trainloader):
        x, y = xy
        optimizer.zero_grad()
        z = model(x)
        loss = Loss(z, y)
        loss.backward()
        optimizer.step()
        loss_list.append(loss.data)
    
    print("Epoch is", epoch)

What it returns when trained and inputted test data:

tensor([[-0.0234,  0.0055,  0.0225,  ..., -0.0304, -0.0017,  0.0139],
        [-0.0234,  0.0055,  0.0225,  ..., -0.0304, -0.0017,  0.0139],
        [-0.0234,  0.0055,  0.0225,  ..., -0.0304, -0.0017,  0.0139],
        ...,
        [-0.0234,  0.0055,  0.0225,  ..., -0.0304, -0.0017,  0.0139],
        [-0.0234,  0.0055,  0.0225,  ..., -0.0304, -0.0017,  0.0139],
        [-0.0234,  0.0055,  0.0225,  ..., -0.0304, -0.0017,  0.0139]],
       grad_fn=<AddmmBackward0>)

Any help or advice would be appreciated.

Thanks,
Rishav

tom · December 22, 2021, 9:32pm

Did you check that the dataloader returns class-balanced (and random) samples (by running a for loop over it and collecting statistics)?

Best regards

Thomas

Rishav_Sen · December 22, 2021, 9:39pm

I confirmed that the dataloader returns class-balanced and random samples.

Thanks,
Rishav

tom · December 22, 2021, 9:41pm

Hi Rishav,

What does the loss do?
What did you try to reduce your your learning rate?

Best regards

Thomas

Rishav_Sen · December 22, 2021, 9:48pm

Hello Tom,

Reducing my learning rate did not solve my problem. The loss variable is used for optimization.

Thanks,
Rishav

tom · December 22, 2021, 9:53pm

Hi Rishav,

but does the loss go up or down or stay the same?
If you set the lr to 0, you should keep the same weights, so I’m tempted to say that it would be good to find where it stops to do funny things.
What optimizer are you using and which parameters? Sometimes when you do funny things with momentum or somesuch, it’ll get the strangest results.

Best regards

Thomas

Rishav_Sen · December 22, 2021, 10:07pm

Hello Tom,

The loss value seems to go down until it reaches around 2, where it stays around. When the learning rate is 0, nothing changes after training. I am using a SGD optimizer. Learning rate = 0.01 and momentum = 0.9. No matter how low I set the learning rate to, the problem persists.

Thanks,
Rishav

tom · December 22, 2021, 10:24pm

You could see if you can overfit a single batch to be sure your loss function is working as expected.

Rishav_Sen · December 27, 2021, 12:41am

Hello Tom,

I determined that I either put in the data incorrectly or there is an error in the training training. Could you look at my code to help find the error

Dataloader:

class Data(Dataset):
    def __init__(self):
        self.x = TrainData
        self.y = TrainOutput
        self.len = self.x.shape[0]

    def __getitem__(self, index):
        return self.x[index], self.y[index]

    def __len__(self):
        return self.len

trainloader = DataLoader(dataset = Data(), batch_size = 1, sampler = sampler)

Training:

for epoch in range(n_epochs):
    model.train(True)
    


    running_loss = 0.
    last_loss = 0.
    for i, xy in enumerate(trainloader):
        x, y = xy
        optimizer.zero_grad()
        z = model(x)
        loss = Loss(z, y)
        loss.backward()
        optimizer.step()
        loss_list.append(loss.data)
        running_loss += loss.item()
        test[y] += 1
        if i % 1000 == 999:
            last_loss = running_loss / 1000 # loss per batch
            running_loss = 0.
    
    print("Epoch is", epoch)
    epoch_number += 1

Model:

class Net(nn.Module):
    def __init__(self, D_in, H1, H2, D_out):
        super(Net, self).__init__()
        self.linear1 = nn.Linear(D_in, H1)
        self.linear2 = nn.Linear(H1, H2)
        self.linear4 = nn.Linear(H2, D_out)

    def forward(self, x):
        x = f.relu(self.linear1(x))
        x = f.relu(self.linear2(x))
        x = self.linear4(x)
        return x

Here is what the variables are.

TrainData:

Shape: torch.Size([50316, 2])
Value: tensor([[ 6.5819e+03, 3.5400e+00], [ 3.4742e+03, 9.2000e-02], [ 5.3910e+03, 3.6600e+00], ..., [ 3.6822e+03, -3.1800e-01], [ 6.2419e+03, 3.2160e+00], [ 5.6387e+03, 1.4360e+01]])

TrainOutput:

Shape: torch.Size([50316])
Value: tensor([6, 4, 5, ..., 4, 6, 8])

Thanks,
Rishav

Rishav_Sen · December 27, 2021, 12:57am

To add on, using activation functions from torch.nn seems to cause the same problem before any training occurs. When torch or torch.nn.functional activation functions are used, the results displayed are before training are random.

tom · December 28, 2021, 7:47pm

To me, it looks as if the trouble is more likely to be in your loss or model than in the bits you posted.