# How can I optimize this code? (Code given)

Hi,
I’m programming a neural network that should give exact output if a human is sick or healthy. Given are 250 pieces of data with 5 properties and then the resulting state of health. Based on my program I could achieve 88% accuracy. How can i improve it? Here is the code:

``````import torch
import numpy as np
import torch.nn as nn

def train(M):
n_steps = 60
learning_rate = 0.01
input_size = 5
output_size = 1

X = M[:, :-1].astype(np.float32)
y = M[:, -1].astype(np.float32)
X = torch.from_numpy(X)
y = torch.from_numpy(y)
feature_means = torch.mean(X, dim=0)

class Model(nn.Module):
def __init__(self, input_size):
super().__init__()
neur = 15
self.layers = nn.Sequential(
nn.Linear(input_size, neur),
nn.PReLU(),
nn.Linear(neur, neur),
nn.Linear(neur, neur),
nn.ReLU(),
nn.Linear(neur, neur),
nn.LogSigmoid(),
nn.Linear(neur, neur),
nn.ReLU(),
nn.PReLU(),
nn.Linear(neur, neur),
nn.Tanhshrink(),
nn.ReLU(),
nn.Linear(neur, neur),
nn.ReLU(),
nn.Linear(neur, output_size),
)
def forward(self, x):
x = x - feature_means
out = self.layers(x)
return out

model = Model(input_size)

criterion = nn.BCEWithLogitsLoss()

for e in range(n_steps):
outputs = model.forward(X)[:, 0]
cost = criterion(outputs, y)
cost.backward()

optimizer.step()

pred_labels = outputs > 0
is_correct = torch.eq(pred_labels, y.byte()).float()
accuracy = torch.mean(is_correct).item()

return model

``````

Thanks for your help Hello Karsten!

Hopefully some forum experts (that is, people other than me)
will have some informed suggestions about how you might
to get things started.

First, as I understand it, you have (only) 250 samples. Each
sample has five properties (inputs), which, I assume, could be
something like {height, weight, blood pressure, cholesterol,
number of cigarettes smoked per day}. Just for concreteness,
I’ll think of them of numbers between 0 and 1. And then each
sample has a single binary output (e.g., {unhealthy, healthy})
which I’ll imagine is coded as a 0 or 1. Is this a reasonable
approximation of what you have?

Some comments on your model. If I understand how `Sequential`
works, I think there is no benefit in having two `Linear`s (fully
connected layers) in a row. The second one is redundant.
(In essence, you are multiplying two `neur` x `neur` weight
matrices together to get a single `neur` x `neur` matrix.)
I imagine that having the extra, redundant weights could slow
down or otherwise confuse your training. In a similar vein
`ReLU` followed by `PReLU` seems to me to be equivalent to
just a single `PReLU`. Also, `Tanhshrink` followed by `ReLU` is
similar (although not identical) to just a `ReLU` when taking into
account the bias weights in the previous `Linear` layer.

Even trimming those redundancies out, it still seems like you
have an awful lot of layers and parameters for a five-feature
binary classification problem. I would be worried about
overfitting, especially with only 250 samples. (You have
almost as many trainable parameters as you have numbers

If you aren’t already doing so, you should split your data set
into a training set and a validation set, e.g., 200 samples
for training and the remaining 50 samples for validation.
and then see how well it works on your “out-of-sample”
validation set. You do want to look at your model’s accuracy
on the training set, but you must also check it on the
validation set. If you get good accuracy on the former,
but not so good on the latter, you are overfitting.

If I understand your code correctly, it looks to me like you
are using your entire training set for each step of your
`Adam` optimization algorithm. (That is, you are using a
“batch size” of 250.) In general, breaking your training
set up into smaller batches is thought to work better.
First, you do less computation per optimizer step, so your
training should run faster. But more importantly, even
though using your whole training set as a single batch gives
the most accurate loss-function gradient for that optimization
step, it is thought that using the noisier, less accurate
gradients you get from smaller (randomly chosen) batches
descent algorithm from “getting stuck,” and the noise is
thought to also help reduce overfitting.

I don’t have a good rationale for choosing a batch size, so
maybe the experts can help out, but I would think (especially
with the small size of your training set) somewhere in the
range of 5 to 25 might be appropriate.

It would be helpful to us if you could show us plots (or tables)
of the training and validation loss and accuracy as a function
of training iteration.

(As an aside, in fairness to the broader world of machine learning,
for a problem this seemingly simple, you might want to compare
your neural-network results with an “old-school” approach such
as adaboosting a stub tree or building a support-vector machine,
or even just running a linear regression.)

Have fun!

K. Frank

1 Like

Thank you very much Frank,
I optimized the hidden layers as mentioned. But how can I prevent overfitting? And how do you differentiate between training data sets and accuracy data sets? My approach would be as follows:

``````import numpy as np

def train_test_split(M):
train_data = M[:round(len(M)*0.8)]
test_data = M[:round(len(M)*0.2)]
return train_data, test_data
``````

Hi Karsten!

As far as your data splitting goes, you should try what you have
and see. Take a look at your split data and see if it makes sense
to you.

(Hint: Your code will use the first 80% of your data for training,
and the first 20% of your data for testing. So your testing data
will be included in your training, and that isn’t really what you
want.)

It’s hard to say about over-fitting. You haven’t really shared