How can I optimize this code? (Code given)

I’m programming a neural network that should give exact output if a human is sick or healthy. Given are 250 pieces of data with 5 properties and then the resulting state of health. Based on my program I could achieve 88% accuracy. How can i improve it? Here is the code:

import torch
import numpy as np
import torch.nn as nn

def train(M):
    n_steps = 60
    learning_rate = 0.01
    input_size = 5
    output_size = 1

    X = M[:, :-1].astype(np.float32)
    y = M[:, -1].astype(np.float32)
    X = torch.from_numpy(X)
    y = torch.from_numpy(y)
    feature_means = torch.mean(X, dim=0)
    class Model(nn.Module):
        def __init__(self, input_size):
            neur = 15
            self.layers = nn.Sequential(
                nn.Linear(input_size, neur),
                nn.Linear(neur, neur),
                nn.Linear(neur, neur),
                nn.Linear(neur, neur),
                nn.Linear(neur, neur),
                nn.Linear(neur, neur),
                nn.Linear(neur, neur),
                nn.Linear(neur, output_size),
        def forward(self, x):
            x = x - feature_means
            out = self.layers(x)
            return out

    model = Model(input_size)

    criterion = nn.BCEWithLogitsLoss()  
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    for e in range(n_steps):
        outputs = model.forward(X)[:, 0]
        cost = criterion(outputs, y)

        pred_labels = outputs > 0
        is_correct = torch.eq(pred_labels, y.byte()).float()
        accuracy = torch.mean(is_correct).item() 
    return model

Thanks for your help :slight_smile:

Hello Karsten!

Hopefully some forum experts (that is, people other than me)
will have some informed suggestions about how you might
improve your model. I do have some superficial comments
to get things started.

First, as I understand it, you have (only) 250 samples. Each
sample has five properties (inputs), which, I assume, could be
something like {height, weight, blood pressure, cholesterol,
number of cigarettes smoked per day}. Just for concreteness,
I’ll think of them of numbers between 0 and 1. And then each
sample has a single binary output (e.g., {unhealthy, healthy})
which I’ll imagine is coded as a 0 or 1. Is this a reasonable
approximation of what you have?

Some comments on your model. If I understand how Sequential
works, I think there is no benefit in having two Linears (fully
connected layers) in a row. The second one is redundant.
(In essence, you are multiplying two neur x neur weight
matrices together to get a single neur x neur matrix.)
I imagine that having the extra, redundant weights could slow
down or otherwise confuse your training. In a similar vein
ReLU followed by PReLU seems to me to be equivalent to
just a single PReLU. Also, Tanhshrink followed by ReLU is
similar (although not identical) to just a ReLU when taking into
account the bias weights in the previous Linear layer.

Even trimming those redundancies out, it still seems like you
have an awful lot of layers and parameters for a five-feature
binary classification problem. I would be worried about
overfitting, especially with only 250 samples. (You have
almost as many trainable parameters as you have numbers
in your data set.)

If you aren’t already doing so, you should split your data set
into a training set and a validation set, e.g., 200 samples
for training and the remaining 50 samples for validation.
Then you train (fit) your model just with your training set
and then see how well it works on your “out-of-sample”
validation set. You do want to look at your model’s accuracy
on the training set, but you must also check it on the
validation set. If you get good accuracy on the former,
but not so good on the latter, you are overfitting.

If I understand your code correctly, it looks to me like you
are using your entire training set for each step of your
Adam optimization algorithm. (That is, you are using a
“batch size” of 250.) In general, breaking your training
set up into smaller batches is thought to work better.
First, you do less computation per optimizer step, so your
training should run faster. But more importantly, even
though using your whole training set as a single batch gives
the most accurate loss-function gradient for that optimization
step, it is thought that using the noisier, less accurate
gradients you get from smaller (randomly chosen) batches
is actually better. The noise can help keep your gradient
descent algorithm from “getting stuck,” and the noise is
thought to also help reduce overfitting.

I don’t have a good rationale for choosing a batch size, so
maybe the experts can help out, but I would think (especially
with the small size of your training set) somewhere in the
range of 5 to 25 might be appropriate.

It would be helpful to us if you could show us plots (or tables)
of the training and validation loss and accuracy as a function
of training iteration.

(As an aside, in fairness to the broader world of machine learning,
for a problem this seemingly simple, you might want to compare
your neural-network results with an “old-school” approach such
as adaboosting a stub tree or building a support-vector machine,
or even just running a linear regression.)

Have fun!

K. Frank

1 Like

Thank you very much Frank,
I optimized the hidden layers as mentioned. But how can I prevent overfitting? And how do you differentiate between training data sets and accuracy data sets? My approach would be as follows:

import numpy as np

def train_test_split(M):
    train_data = M[:round(len(M)*0.8)]
    test_data = M[:round(len(M)*0.2)]
    return train_data, test_data

Hi Karsten!

As far as your data splitting goes, you should try what you have
and see. Take a look at your split data and see if it makes sense
to you.

(Hint: Your code will use the first 80% of your data for training,
and the first 20% of your data for testing. So your testing data
will be included in your training, and that isn’t really what you

It’s hard to say about over-fitting. You haven’t really shared
anything with us about the actual structure of your data, so
it’s hard to have any intuition about your problem.

I still think you should make plots of your loss function and
accuracy for both your training and validation data. What might
those plots look like, and what might they tell you (and us)?


K. Frank