Cant get model to train

Hi there,

I am new to python and pytorch and stuck with my “little” project. I would like to train a model to predict a single value if it is provided a record of 30 values. I have a tensor of features and targets (provided below) for training purposes and a similar set for validation.

While writing this I stumbled upon the feature structure and I guess the feature tensor isnt correct. It should be a list of 8760 *3 records but it is a list of 3 containing a list of 8760 values. Is there an easy way to fix this (I ll have a look at the flatten function)?

It should select a batch of 10 records out of 8760*3 records for each training iteration.

I would like to get feedback on my code, the model a hint regarding the error and if the goal could be achieved with my model.

import torch
from torch import nn
from torch.utils.data import random_split, DataLoader, TensorDataset
import numpy as np
import ann_model as annM
import feature_selections as fs


# Get cpu or gpu device for ann_training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))

dataset_filename = 'file.npy'
dataset_filename = 'file2.npy' # sample input

print("Loading dataset: started")
print(dataset_filename, " will be loaded")
rawDataSet = np.load(dataset_filename, allow_pickle=True)
print("Loading dataset: finished")

# normierung aller Daten ?

print("Specify columns used for the moddeling dataset")
featureSelections = fs.feature_selection()
# https://stackoverflow.com/questions/44429199/how-to-load-a-list-of-numpy-arrays-to-pytorch-dataset-loader
featureData = torch.tensor(np.array(rawDataSet[:][featureSelections.s02()["names"]].tolist()))
targetsData = torch.tensor(np.array(rawDataSet[:]["trafo_q"].tolist()))

print("build pytorch dataset")
dataset = TensorDataset(featureData, targetsData)

print("split dataset into train and test")
train_size = int(0.8 * len(dataset))
test_size = len(dataset) - train_size
train, test = random_split(dataset, [train_size, test_size])

print("Create train- and testsets with dataloader and batchsze = 10")
batch_size = 10
trainset = DataLoader(train, batch_size=batch_size, shuffle=True)
testset = DataLoader(test, batch_size=batch_size, shuffle=True)

print("Creating NeuralNetwork Model")
numberOfFeatures = len(featureSelections.s02()["names"]) # number of features per record
model = annM.NeuralNetwork(numberOfFeatures) # create Model
model.to(device)

print("Optimizing Parameters")
optimizer = torch.optim.SGD(model.parameters(), lr = 0.9) # lr=1e-3
criterion = nn.BCELoss()
## TRAINING
print("Training NeuralNetwork")
model.train()
epochs = 5000
errors = []
for epoch in range(epochs):
    optimizer.zero_grad() # sets gradients to zero
    for feature, target in trainset:
        # Forward pass
        y_pred = model.forward(feature)
        # Compute Loss
        loss = criterion(y_pred.squeeze(), target)
        errors.append(loss.item())

        print('Epoch {}: train loss: {}'.format(epoch, loss.item()))
        # Backward pass
        loss.backward()
        optimizer.step()
### MODEL
import torch.nn as nn
import torch.nn.functional as F


# Define model
class NeuralNetwork (nn.Module):
    def __init__(self, numberOfFeatures):
        super(NeuralNetwork, self).__init__()
        self.l0 = nn.Linear(numberOfFeatures, 1024)
        self.l1 = nn.Linear(1024, 512)
        self.l2 = nn.Linear(512, 128)
        self.l3 = nn.Linear(numberOfFeatures, 1)

    def forward(self, inputs):
        output = self.l0(inputs)
        output = F.relu(self.l1(output))
        output = F.relu(self.l2(output))
        output = F.relu(self.l3(output))
        return output

    def training(self, trainset, optimizer, criterion):
        # self.model.train() # et mode to train other mode model.eval() called outside of class
        epochs = 5000
        errors = []
        for epoch in range(epochs):
            optimizer.zero_grad() # sets gradients to zero
            for feature, target in trainset:
                # Forward pass
                y_pred = self.forward(feature)
                # Compute Loss
                loss = criterion(y_pred.squeeze(), target)
                errors.append(loss.item())

                print('Epoch {}: train loss: {}'.format(epoch, loss.item()))
                # Backward pass
                loss.backward()
                optimizer.step()
## TRAINING
model.train()
epochs = 5000
errors = []
for epoch in range(epochs):
    optimizer.zero_grad() # sets gradients to zero
    for feature, target in trainset:
        # Forward pass
        y_pred = model.forward(feature)
        # Compute Loss
        loss = criterion(y_pred.squeeze(), target)
        errors.append(loss.item())

        print('Epoch {}: train loss: {}'.format(epoch, loss.item()))
        # Backward pass
        loss.backward()
        optimizer.step()
## FEATURES
>>> feature
tensor([[[ 1.0000e+00, -6.5955e+01,  1.4160e+01,  ...,  2.0000e+00,
           0.0000e+00,  2.0000e+00],
         [ 2.0000e+00, -6.4064e+01,  1.4160e+01,  ...,  2.0000e+00,
           0.0000e+00,  2.0000e+00],
         [ 3.0000e+00, -6.1796e+01,  1.6460e+01,  ...,  2.0000e+00,
           0.0000e+00,  2.0000e+00],
         ...,
         [ 8.7580e+03, -1.0670e+02,  4.3340e+01,  ...,  2.0000e+00,
           0.0000e+00,  2.0000e+00],
         [ 8.7590e+03, -1.0681e+02,  4.1720e+01,  ...,  2.0000e+00,
           0.0000e+00,  2.0000e+00],
         [ 8.7600e+03, -1.0487e+02,  4.3040e+01,  ...,  2.0000e+00,
           0.0000e+00,  2.0000e+00]],

        [[ 1.0000e+00, -7.6689e+01,  1.4160e+01,  ...,  2.0000e+00,
           0.0000e+00,  2.0000e+00],
         [ 2.0000e+00, -7.4489e+01,  1.4160e+01,  ...,  2.0000e+00,
           0.0000e+00,  2.0000e+00],
         [ 3.0000e+00, -7.1851e+01,  1.6460e+01,  ...,  2.0000e+00,
           0.0000e+00,  2.0000e+00],
         ...,
         [ 8.7580e+03, -1.2383e+02,  4.3340e+01,  ...,  2.0000e+00,
           0.0000e+00,  2.0000e+00],
         [ 8.7590e+03, -1.2397e+02,  4.1720e+01,  ...,  2.0000e+00,
           0.0000e+00,  2.0000e+00],
         [ 8.7600e+03, -1.2170e+02,  4.3040e+01,  ...,  2.0000e+00,
           0.0000e+00,  2.0000e+00]],

        [[ 1.0000e+00, -3.7761e+01,  4.6055e+03,  ...,  2.7200e+02,
           1.6400e+02,  2.8600e+02],
         [ 2.0000e+00, -5.0640e+01,  4.6055e+03,  ...,  2.7200e+02,
           1.6400e+02,  2.8600e+02],
         [ 3.0000e+00, -4.5840e+01,  4.9473e+03,  ...,  2.7200e+02,
           1.6400e+02,  2.8600e+02],
         ...,
         [ 8.7580e+03,  3.6286e+01,  2.6821e+04,  ...,  2.7200e+02,
           1.6400e+02,  2.8600e+02],
         [ 8.7590e+03,  4.0603e+01,  2.6394e+04,  ...,  2.7200e+02,
           1.6400e+02,  2.8600e+02],
         [ 8.7600e+03,  3.9810e+01,  2.6441e+04,  ...,  2.7200e+02,
           1.6400e+02,  2.8600e+02]]], dtype=torch.float64)
>>> feature.shape
torch.Size([3, 8760, 30])
## TARGETS

>>> target
tensor([[ 31.6668,  33.5949,  35.8350,  ..., -26.1178, -24.8364, -24.1894],
        [ 36.8206,  39.0619,  41.6658,  ..., -30.3122, -28.8252, -28.0726],
        [  3.4025,   6.6635,   3.2758,  ..., -12.6210, -15.8263, -15.4595]],
       dtype=torch.float64)
torch.Size([3, 8760])
## ERROR
>>> y_pred = model.forward(feature)
Traceback (most recent call last):
  File "~Eclipse-2020-09\dropins\PyDev 8.0.1\plugins\org.python.pydev.core_8.0.1.202011071328\pysrc\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<console>", line 1, in <module>
  File "~workspace\q_estimation\ann_model.py", line 24, in forward
    output = self.l0(inputs)
  File "~Python\Python38\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "~Python\Python38\site-packages\torch\nn\modules\linear.py", line 94, in forward
    return F.linear(input, self.weight, self.bias)
  File "~\Python\Python38\site-packages\torch\nn\functional.py", line 1753, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: expected scalar type Double but found Float

You are feeding float64 tensors to the model, which uses float32 parameters so you would have to transform the inputs via x = x.float() before feeding them to the model.

Also, your last non-linearity is F.relu while nn.BCELoss expects probabilities in [0, 1]. I would recommend to remove F.relu and pass the raw logits to nn.BCEWithLogitsLoss instead for a better stability than sigmoid + nn.BCELoss.

Thank you very much for your reply ptrblck.
This reply is both for documentation purposes as well as an invitation to help.

I flattened the tensors using

features = flatten(features, start_dim=0, end_dim=1)

changing the dimensions from torch.Size([3, 8760, 30]) to torch.Size([35040, 30]), which looks better.

Following your suggestions I changed:

model.forward(feature)
to
model.forward(feature.float())
ttps://pytorch.org/docs/stable/generated/torch.set_default_dtype.html

The default floating point dtype is initially torch.float32.
ttps://pytorch.org/docs/stable/generated/torch.set_default_dtype.html - explains that…

add torch.set_default_dtype(torch.float64) 
after
super(NeuralNetwork, self).__init__()

As you suggested here should change the dtype to float64 but I get the error

undefinded variable from import float64
despite addint “import torch” prior to the class definition

since this is just a matter of precision and I just want a running Model for starters - I am fine with that

output = F.relu(self.l3(output))
return output
to
return self.l3(output)

executing

model.forward(feature.float())
causes a
RuntimeError: mat1 and mat2 shapes cannot be multiplied...

From your respons in this thread the following line should be wrong, since the inputs is numberOfFeatures * batchsize

self.l0 = nn.Linear(numberOfFeatures, 1024)
but changing this line to
self.l0 = nn.Linear(numberOfFeatures *batch_size, 1024)
still causes the following:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (10x30 and 300x1024)

hmm, new users cannot add more than 2 links

The flattening looks wrong, since you are changing the batch size. Change the arguments to start_dim=1, end_dim=-1 (or just use the default).
Alternatively, use x = x.view(x.size(0), -1) which is a bit more clear to me to see how the tensor is flattened.

1.
I dont understand why the flattening looks wrong to you. In contrast to the feature tensor shown in my initial post the tensor post flattening looks as it should (from my point of view). I want the input to be of shape torch.Size([35040, 30]) rather torch.Size([3, 8760, 30]). I.e. sample of 4 cars with 30 features and 8760 records each of which 3 cars are for training and 1 for validation purposes. The training should focus on the records independent of the car because the targets for a new car should be estimated based on the inputs of all cars. I hope this makes sense and therefore the flattend inputs should be correct.

>>> features
tensor([[ 1.0000e+00, -6.5955e+01,  1.4160e+01,  ...,  2.0000e+00,
          0.0000e+00,  2.0000e+00],
        [ 2.0000e+00, -6.4064e+01,  1.4160e+01,  ...,  2.0000e+00,
          0.0000e+00,  2.0000e+00],
        [ 3.0000e+00, -6.1796e+01,  1.6460e+01,  ...,  2.0000e+00,
          0.0000e+00,  2.0000e+00],
        ...,
        [ 8.7580e+03,  3.6286e+01,  2.6821e+04,  ...,  2.7200e+02,
          1.6400e+02,  2.8600e+02],
        [ 8.7590e+03,  4.0603e+01,  2.6394e+04,  ...,  2.7200e+02,
          1.6400e+02,  2.8600e+02],
        [ 8.7600e+03,  3.9810e+01,  2.6441e+04,  ...,  2.7200e+02,
          1.6400e+02,  2.8600e+02]], dtype=torch.float64)

>>> features.shape
torch.Size([35040, 30])

2.
In addition I changed the model in the following way because I could not figure out what was wrong. First I thought that the numberOfFeatures (30) value wasnt enough because it should be numberOfFeatures * batchsize but that didnt work either so I chose the easiest route that didnt cause any “mat1 and mat2 shapes cannot be multiplied” errors.

        self.l0 = nn.Linear(numberOfFeatures, 30)
        self.l1 = nn.Linear(30, 30)
        self.l2 = nn.Linear(30, 30)
        self.l3 = nn.Linear(numberOfFeatures, 1)

I havent figured it out yet, I will look into this later today.

Currently the model does not cause any errors, there is still at least one thing wrong. The first two training iterations of the first epoch look “fine” but the third does not. Both feature and target tensors are neither empty nor nan

Epoch 0: train loss: -1894.735568661991
Epoch 0: train loss: -4.14948531980407e+19
Epoch 0: train loss: nan

Is this behaviour related to the structure of the model and the design of the forward function?
~~
Thank you very much for helping me out!

Where am I chaniging the batch size? In the first Layer of the NN?
flatten(x) and x.view(x.size(0), -1) have different results, was either just a “non-working” hint or am I missing something?

>>> flatten(features).shape
torch.Size([1051200])

>>> features.view(features.size(0), -1).shape
torch.Size([4, 262800])

>>> features.view(-1).shape
torch.Size([1051200])

what am I trying to accomplish currently?
the input of the forward function needs to be 1D, and my first attempt was better

but still wrong, and should be

y_pred = model.forward(flatten(feature).float())

right?

self.l0 = nn.Linear(numberOfFeatures*batchSize, 30)
self.l1 = nn.Linear(30, 30)
self.l2 = nn.Linear(30, 30)
self.l3 = nn.Linear(numberOfFeatures, 1)

works up to the function

criterion(y_pred.squeeze(), target)

because (my guess) the flattened input of 300 values does not correspond with the 10 target values, causing the following error:

ValueError: Target size (torch.Size([10])) must be the same as input size (torch.Size([]))

the question is, where and how do I change the target size?

In the simple use case nn.Linear layers expect a 2D input in the shape [batch_size, nb_features] (you can add an additional dimension, which is described in more detail in the docs).
You are currently flattening the features into the batch dimension, which is usually wrong. I’m not familiar with your use case, so it might be exactly what you want to achieve, but you should be able to explain it, why this is needed. In your current use case you would not treat the input as e.g. batch_size=10 samples which each contain nb_features features, but would instead increase the number of samples by multiplying the batch size with the feature dimension.

The proposed way of flattening the activation (either via using the default arguments in nn.Flatten or via the view operation) will keep the batch dimension equal (i.e. the number of samples in a batch stay the same) and will flatten all other dimensions into dim1.

This line of code for example:

self.l0 = nn.Linear(numberOfFeatures*batchSize, 30)

changes the number of samples in a batch and will thus yield a shape mismatch in the loss calculation unless you also change the target.

The use case is the following:
Lets say there are 4 objects having 31 features including feature “Q” and each feature of each object has 8760 values (dataset.dim = [4, 8760, 31]). I would like to train a neural network to estimate Q for any given object as well was new objects having only 30 features (“Q” excluded).

That being said, the feature-tensor should have 4*8760 records and 30 “columns”

and a target-tensor of 4*8760 records and 1 “column”.

>>> features.shape
torch.Size([35040, 30])

Splitting the datasaet into training and validation sets the result creates a feature-training-tensor [38760, 30] and target-training-tensor [38760,1]. With regard to the dimensions I would assume, that the target-tensor is either [38760] and the y_pred is being “squeezed” to one Dimension, or the target-tensor is [38760,1] and no squeezing would be required. Is that assumption correct?

The initialisation of the model das not require the batch size, thus, the input of the input layer has the number of features (30) and thats it.

For each batch i.e. 10 the first 10 records [10,30] out of [3*8760,30] will be selected from the feature tensor and passed to the input layer.
I wonder how the samples of each batch are being processed. The input layer has one neuron for each feature, hence, the samples would need to be processed sequentially, but how? there is no loop in the forward function…

the current model causing the “nan” error on the third training iteration

        self.l0 = nn.Linear(numberOfFeatures, 60)
        self.l1 = nn.Linear(60, 90)
        self.l2 = nn.Linear(90, 30)
        self.l3 = nn.Linear(30, 1)
    def forward(self, inputs):
        output = self.l0(inputs)
        output = F.relu(self.l1(output))
        output = F.relu(self.l2(output))
        return self.l3(output)
batch_size = 10
optimizer = torch.optim.SGD(model.parameters(), lr = 0.05) # lr=1e-3
criterion = nn.BCEWithLogitsLoss()

model.train()
epochs = 5000
errors = []
for epoch in range(epochs):
    optimizer.zero_grad() # sets gradients to zero
    for feature, target in trainset:
        y_pred = model.forward(feature.float())
        loss = criterion(y_pred.squeeze(), target.float())
        errors.append(loss.item())
        print('Epoch {}: train loss: {}'.format(epoch, loss.item()))
        loss.backward()
        optimizer.step()
>>> feature.shape
torch.Size([10, 30])
>>> target.shape
torch.Size([10])
# within for feature, target in trainset: loop
Epoch 0: train loss: 62.11237716674805
Epoch 0: train loss: -1679589769216.0
Epoch 0: train loss: nan
>>> model.forward(feature.float())
tensor([[nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan]], grad_fn=<AddmmBackward>)

Both feature and target tensors do contain float32 data.
A few reasons were provieded here.

I will try normalization first, since there are large values in the data. (source)

Since your loss is getting negative values (and underflows) I guess your target is not using values in [0, 1], so you should check it.

First of all i would like to thank you for your support!!!

Here is how I normalized both features and targets, which are tensors of tensors. The goal was to normalize all values with the same inner tensor position (i.e. for each column), based on the global maximum and minimum of that position (“column”).

featuresMax = torch.max(features, dim=0)[0]
featuresMin = torch.min(features, dim=0)[0]

for i in range(0, features.shape[1]):
    for j in range(0, len(features)):
        features[j][i] = (features[j][i] - featuresMin[i])/(featuresMax[i]-featuresMin[i]) # range 0 to 1
targetMax = torch.max(targets, dim=0)[0]
targetMin = torch.min(targets, dim=0)[0]

for i in range(0, len(targets)):
    targets[i] = (targets[i] - targetMin)/(targetMax-targetMin) # range 0 to 1

The training function

model.train()
epochs = 5
errors = []
for epoch in range(epochs):
    optimizer.zero_grad() # sets gradients to zero
    for feature, target in trainset:
        y_pred = model.forward(feature)
        loss = criterion(y_pred.squeeze(), target)
        errors.append(loss.item())
        print('Epoch {}: train loss: {} MaxTarget: {} MinTarget: {} maxFeature: {} MinFeature: {}'.format(epoch, loss.item(), amax(target), amin(target), amax(feature), amin(feature)))
        # print('Epoch {}: train loss: {} '.format(epoch, loss.item()))
        loss.backward()
        optimizer.step()

The following plot shows the plt.plot(errors) of all iterations of 5 epochs (len(trainset)=2804) and does not look good because the errors/losses do the opposite of decreasing.

I tried changing the following:

  • removing features (rather invariant ones)
  • removing hidden layers
  • changing the learning rate
  • reducing the amount of neurons of the hidden layers
  • changing the batch size

Chaning the batch size had the largest impact in the loss calculation (same amount of features, batch size increased from 10 to 1000, epoch increased from 5 to 50)
grafik

Questions:
Am I heading in the right direction - does the model “work as is” and it is “just” a matter of tweaking the parameters (activation functions, lr, etc.) and sets of features?
Is it plausible that the changed batch size has such an (positive) impact?
Is there a rule of thumb with regard to batch size, epoch size and sample size?

Exactly what I needed! You just saved me several hours. Thanks!