Accuracy doesn't change

NiLSPACE · May 1, 2024, 12:42pm

Hi all,

The model that I’m trying to train doesn’t seem to increase it’s accuracy. I’m used to the MLPClassifier from scikit-learn because it’s what we used in our datascience classes, but I was hoping to use PyTorch to create more advanced models. I’m using the Skorch library wrapper to remove allot of boiler plate code.

Purpose of the model

I’m working with radar data of road surfaces. In the data there are lots of echo’s of the road surface. The purpose of the model is to detect the actual road. When converting a slice of the data to an image it looks like this:

The red line is what Scikit’s MLPClassifier predicts and the green dots are the actual road surfaces which were manually input to train the dataset. With the MLPClassifier I was able to use the position as the target label. The MLPClassifier worked on a per-project basis, but got too slow when trying to combine all the projects together. I was hoping to use PyTorch which can use my GPU to create a more advanced model.

Data

The data consists of two files. The input file and the prediction/label file. I’ve already processed them so each row correlate to each other.

Input file

The input data is a CSV file where every row consists of 512 floating point numbers. The numbers range from roughly -5000 to 5000.

0	1	…	510	511
512.0	230.0		-1836.0	-1836.0
512.0	225.0		-1802.0	-1802.0

Label file

This file consists of two numbers, but only the Y-column is used. The values in this column are used as the target label.

X	Y
1	211
3	212

Code

As I said earlier I’m using the Skorch library to remove allot of the boiler plate code. I have tried using PyTorch directly, but I had similar issues where each epoch didn’t change the accuracy. This makes me think either my model/layers are wrong or I’m making a mistake while providing the data to the model. The code I’m using looks like the following:

import torch
import torch.nn as nn
import pandas as pd
import skorch
import torch.nn.functional as F

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# Read input data
data = pd.read_csv("project-input.csv", index_col=None)
X = data

# Read prediction data
predictions = pd.read_csv("project-predictions.csv", index_col=None)
y = predictions['Y']

# The model
class Multiclass(nn.Module):
    def __init__(self, num_outputs=512):
        super(Multiclass, self).__init__()
        self.dropout = nn.Dropout(0.5)
        self.hidden=nn.Linear(512, 2048)
        self.output = nn.Linear(2048, 512)
        
    def forward(self, x):
        x = x.reshape(-1, self.hidden.in_features)
        x = F.relu(self.hidden(x))
        x = self.dropout(x)
        x = F.softmax(self.output(x), dim=-1)
        return x

net = skorch.classifier.NeuralNetClassifier(
    module=Multiclass,
    device=device
)


X = torch.tensor(X.values, dtype=torch.float32)
y = torch.tensor(y.values, dtype=torch.int64)

X = X.to(device)
y = y.to(device)

net.fit(X, y)

This runs, but the accuracy does not improve.

  epoch    train_loss    valid_acc    valid_loss      dur
-------  ------------  -----------  ------------  -------
      1        6.2400       0.0008        6.2409  20.5180
      2        6.2403       0.0008        6.2409  17.4280
      3        6.2403       0.0008        6.2409  18.2130
      4        6.2400       0.0008        6.2409  19.0680
      5        6.2402       0.0008        6.2409  20.5232

I was hoping there is an obvious mistake that people see. As I said earlier I’m mostly familiar with the MLPClassifier, so the most I’ve previously messed around with was the hidden layers.

ptrblck · May 1, 2024, 1:10pm

The NeuralNetClassifier seems to use nn.NLLLoss as its default criterion so you would need to pass log probabilities to it via F.log_softmax instead of F.softmax. Change it and see if this would improve the PyTorch model training.

NiLSPACE · May 1, 2024, 1:24pm

Thanks for the fast response!

I’ve tried replacing softmax(self.output(x), dim=-1) with log_softmax(self.output(x), dim=-1) and now it outputs

  epoch    train_loss    valid_acc    valid_loss      dur
-------  ------------  -----------  ------------  -------
      1           nan       0.0000           nan  10.8670
      2           nan       0.0000           nan  6.3860
      3           nan       0.0000           nan  5.3280
      4           nan       0.0000           nan  5.2080
      5           nan       0.0000           nan  5.6330
      6           nan       0.0000           nan  5.8010

ptrblck · May 1, 2024, 1:28pm

Could you print the outputs before returning them from the forward to check if these are overflowing?

NiLSPACE · May 1, 2024, 1:38pm

Adding a print(x) just before returning from forward results in the following:

tensor([[-1561.2009, -2532.7739,  -892.7590,  ..., -2790.2212, -2430.4778,
         -3008.7415],
        [-3527.6836, -3519.8684,  -970.9326,  ..., -2911.9658, -2074.8315,
         -2793.2590],
        [-2680.8403, -2953.5088, -2104.6016,  ..., -2319.5645, -1220.5420,
         -4527.9507],
        ...,
        [-1753.6694, -1107.4957, -2239.1433,  ..., -2213.6201, -2000.9958,
         -1967.8594],
        [-2916.8323, -3349.4568, -1327.4064,  ..., -2687.6819, -1950.6512,
         -3970.4326],
        [-1833.4132, -2364.8330, -1087.6481,  ..., -2808.8252,  -785.6394,
         -2791.5884]], device='cuda:0', grad_fn=<LogSoftmaxBackward0>)

I’m not familiar enough with PyTorch (or machine learning in general) to know if it’s overflowing. How can I recognize that?

AlphaBetaGamma96 · May 1, 2024, 2:19pm

Can you backprop if your output is an integer?

ptrblck · May 2, 2024, 2:58pm

Are all values valid in this output and is torch.isfinite(output).all() returning True?