Wrong Output: Average loss doesn't go below certain number

Hey Community,
I´m very new to PyTorch and wrote my first working programm recently.
I do this for my bachelor-degree and have problems with the output of my model.

I am using a dataset about the number of people, beeing in a room at the same time, depending on some parameters like the date, time, holidays, weather-conditions and so on. The aim is to predict the number of people for a certain time, like half an hour, an hour or two in the future.

There are 13 Input-Values and one output.

I think the biggest trouble I have right now is to configure the parameters correctly that my loss drops lower than it is. Right now, it is always about 1.5 in average.

I hope someone of you has some tips or tricks how to get along with my data correclty, since this is required for my degree.

I will leave you the code here.

Thank you in advance!!!

Kind regards
Christian Richter

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torch.optim as optim
import pandas as pd
import matplotlib.pyplot as plt

### Dataset ###

dataset = pd.read_csv('./data/csv/train_data_csv_2.csv')

x_temp = dataset.iloc[:, :-1].values


y_temp = dataset.iloc[:, 13:].values


x_train_tensor = torch.FloatTensor(x_temp)
y_train_tensor = torch.FloatTensor(y_temp)

### Network Architecture ###

class Network(nn.Module):
    def __init__(self):
        super(Network, self).__init__()
        self.linear1 = nn.Linear(13, 13)  #13 Input-Neurons, 13 Output-Neurons, Linearer Layer
        self.linear2 = nn.Linear(13, 13)
        self.linear3 = nn.Linear(13, 13)
        self.linear4 = nn.Linear(13, 1600)

    def forward(self, x):
        pax_predict = F.torch.sigmoid(self.linear1(x))
        pax_predict = F.torch.sigmoid(self.linear2(x))
        pax_predict = F.torch.sigmoid(self.linear3(x))
        pax_predict = self.linear4(x)
        return pax_predict

    def num_flat_features(self, pax_predict):
        size = pax_predict.size()[1:]
        num = 1
        for i in size:
            num *= i
        return num

network = Network()

## Loss-Functions ###

criterion = nn.MSELoss()

target = Variable(y_train_tensor)


#optimizer = torch.optim.SGD(network.parameters(), lr=0.00000001)       #Epochen: 50-100
#optimizer = torch.optim.Adam(network.parameters(), lr=5)               #Epochen: 50
#optimizer = torch.optim.Adadelta(network.parameters(), lr=10)         #Epochen: 50
optimizer = torch.optim.SGD(network.parameters(), lr=1, momentum=0.8)               #Epochen: 50-100

### Training ###

for epoch in range(100):
    input = Variable(x_train_tensor)
    y_pred = network(input)

    loss = criterion(y_pred, target)

    loss_avg = loss /len(y_train_tensor)



    print('Epoch:', epoch, ' Total Loss:', loss.data)
    print('Average Loss:', loss_avg)

    plt.scatter(epoch, loss_avg.data, color='r', s=10, marker='o')


#test_exp = torch.Tensor([[40116],[33], [1], [0], [1], [0], [0], [0], [0]])
test_exp = torch.Tensor([[231216,36,5,0,1,0,0,0,0,1,1,1,1]])

result = network(test_exp).data[0][0].item()

print('Vorhergesagte Anzahl: ', result)

The code looks generally alright besides some minor issues:

  • Variables are deprecated since PyTorch 0.4.0, so you can just use tensors directly in newer versions
  • Although this shouldn’t be an issue in your code, you shouldn’t use the .data attribute, as this might create silent errors in training code

That being said, the learning rate seems to be a bit high.
Have you tried to use other non-linear activations, e.g. relu?
I would recommend to scale down your model (and data) a bit and make it work with a simplified version.
Once your model trains successfully, you could try to scale it up again.

1 Like

Thank you very much for your suggestions!
I removed the Variables and use the tensors now directly.
When I remove the .data Attribute from the result and plt.scatter-function, there will be an error-code. How do I get around this? Deleting it from print('Epoch:', epoch, ' Total Loss:', loss.data) went without problems.

I tried a lot of learning-rates. Most of them converge against a loss of 1.5.

I also tried relu in my forward-pass but for thesis-reasons I use sigmoid.
Or do you mean the “linear-layers”? If so, could you tell me how to use a different type there? I wasn´t able to find a working alternative.

I try downsizing my model a little and will see, how the output changes.
Thank you so much so far!


I tried using a smaller data-model. This delivered less accurate values.
The average-loss converges now against 2.

I tried using ReLU in my __init__-part, but this error-message comes up:

TypeError: __init__() takes from 1 to 2 positional arguments but 3 were given

I don´t know why linear works while ReLU or Sigmoid won´t.