Neural Network only gives outputs of 0

ApeelingPotato · April 17, 2020, 10:30pm

Hi there!

I am trying to train a neural network with a tensor of 1040 float inputs and have a singular float output. No matter what I do the predicted value comes out as 0
I think the problem I am having is with teaching the network. My Net class is as below. The interneuron_count is 64.

 class Net(nn.Module):
        def __init__(self):
            super().__init__()
            self.fc1 = nn.Linear(1040, interneuron_count)  # Fully connected layer1 with 1040 inputs
            self.fc2 = nn.Linear(interneuron_count, interneuron_count)
            self.fc3 = nn.Linear(interneuron_count, 1)  # final fully connected layer with one output

        def forward(self, x):
            x = F.relu(self.fc1(x))
            x = F.relu(self.fc2(x))
            x = self.fc3(x)
            return F.log_softmax(x, dim=0)

This is the code segment on how I train my neural network:

    for epoch in range(EPOCHS):
        for i in tqdm(range(0, (len(train_inputs)), BATCH_SIZE)):
            batch_X = train_inputs[i:i+BATCH_SIZE]
            batch_Y = train_outputs[i:i+BATCH_SIZE]

            net.zero_grad()
            outputs = net(batch_X)
            loss = loss_function(outputs, batch_Y)
            loss.backward()

            if epoch > learning_rate_cutoff:  # this acts as a decaying learning rate for our system
                learning_rate = 0.0001
            optimizer.step()

When I test the NN using print(net(test_inputs[0])) I get
tensor([0.], grad_fn=<LogSoftmaxBackward>)

instead of a value of around 1.25901

Any help with this would be greatly appreciated! Thank you

KFrank · April 18, 2020, 12:23am

Hi Brian!

ApeelingPotato:

No matter what I do the predicted value comes out as 0
I think the problem I am having is with teaching the network.
            self.fc3 = nn.Linear(interneuron_count, 1)  # final fully connected layer with one output
...
            x = self.fc3(x)
            return F.log_softmax(x, dim=0)
When I test the NN using print(net(test_inputs[0])) I get
tensor([0.], grad_fn=<LogSoftmaxBackward>)

Your problem is that you are passing a length-one vector to
softmax(). softmax() returns a vector of probabilities that
sum to one. Since there is only one probability in the sum, it
will always be 1.0. log (1.0) = 0.0, so, analogously,
log_softmax() will always return 0.0.

If this network is for a binary classification problem, and your
single output is supposed to indicate whether your input is in
class-“0” or c;ass-“1”, then you should have

    return F.sigmoid (x)

and use BCELoss for your loss function (or just return x without
the sigmoid(), and use BCEWithLogitsLoss).

As an aside, in return F.log_softmax(x, dim=0), dim = 0 is
the batch dimension. I’m guessing in the example you gave that
your batch size in 1. If it did make sense to use log_softmax()
(which it doesn’t, but would, for example, if your last layer were
Linear(interneuron_count, 10)) you would want something like
log_softmax(x, dim = 1) or log_softmax(x, dim = -1).

Best.

K. Frank

actuallyaswin · April 18, 2020, 12:51am

To add to KFrank’s post, this deals with the fundamental difference between sigmoid and softmax. A more detailed comparison can be found here on StackExchange; specifically with regards to neural networks, this post on StackExchange is also helpful.

ApeelingPotato · April 18, 2020, 7:56am

Hi KFrank! Thanks for getting back to me

Will this return a float out of the neural network?

return F.sigmoid (x)

Im only getting 1s out of the NN.

Im using batch sizes of ten into the network. I also started with using the optim.Adam and the MSELoss function

KFrank · April 18, 2020, 11:32am

Hi Brian!

If x is a FloatTensor, it will return a FloatTensor. In general, it
will return a tensor of the type passed in (if the type is supported).

Mathematically, sigmoid() will “saturate” and become exactly 1
when x becomes inf (and become 0 when x is -inf).

For floats, sigmoid() becomes exactly 1.0 somewhere around
x = 17.0.

Check the values of x going into sigmoid().

How large are your inputs? Do you only get 1s out even on your
first pass before you start training?

By default, your Linear layers are constructed with “sensible”
random values for their weights. If your inputs are of order 1,
then before you start training (that is, before the initial values of
the weights have changed) the x going into sigmoid() should
have sensible values, and you shouldn’t be getting all exactly 1s.

If the output of your network starts out not being all 1s, but then
becomes all 1s after training for a while, then your training is
pushing your network to return all 1s (which may or may not
make sense, depending on what you are training your network
to do).

What are typical batch_Y (target) values? And what do they
mean? What are you trying to train your network to do?

Best.

K. Frank

ApeelingPotato · April 18, 2020, 12:08pm

The inputs are acceleration and gyroscope data and timing of that data. The purpose of the NN is a study into the effectiveness of NN to measure stride lengths. I am getting values greater than one now thankfully, I think I was not running the training for long enough. The problem now is that for a test batch of 200 inputs the output remains the same.
Typical values range from 1.2514 to 1.5568. Is there any way to get it so it gives a unique float output for each input in this range?

KFrank · April 18, 2020, 1:54pm

Hi Brian!

Okay, that sounds reasonable.

But what are the batch_Y (targets)? Are they known stride lengths?

We’re not on the same page here. You can’t get a value greater
than one out of a sigmoid().

I don’t really understand what you mean. Do you mean that you have
a batch size of 200, and for each sample in the batch you get exactly
the same value for the single, scalar output? (Also, I thought you
said earlier that your batch size was ten.)

Do you mean that the values of your outputs remains exactly the
same (for the same inputs) even as you train your network?

Typical values of what? Are you referring to input values? Output
values?

What do you mean by a “unique float output?” A generic floating-point
number is pretty close to unique? (There are approximately 2^32
different floating-point numbers, so any two generic values will almost
certainly differ.)

Best.

K. Frank

ApeelingPotato · April 18, 2020, 2:25pm

Sorry for the confusion in the last post and thank you for being patient with me.

But what are the batch_Y (targets)? Are they known stride lengths?

These are known stride lengths (outputs) in the range 1.2514 to 1.5568 that in batches of 10 in a tensor.

I don’t really understand what you mean. Do you mean that you have a batch size of 200, and for each sample in the batch you get exactly the same value for the single, scalar output?

For the test batch, I meant I have separate data that I compare my NN with. It has 200 tensor(1, 1040) inputs with their respective outputs.

The issue I have is when I feed an input tensor into the neural network, it feeds out a tensor with the same value single float value for each input. I want the output to be specific with each input. The training and testing code is as follows with the NN class as above

    val_size = int(len(input_tensor) * test_partition)
    train_inputs = input_tensor[:-val_size]
    train_outputs = output_tensor[:-val_size]
    test_inputs = input_tensor[-val_size:]
    test_outputs = output_tensor[-val_size:]

        for epoch in range(EPOCHS):
        for i in tqdm(range(0, (len(train_inputs)), BATCH_SIZE)):
            batch_X = train_inputs[i:i+BATCH_SIZE]
            batch_Y = train_outputs[i:i+BATCH_SIZE]

            net.zero_grad()
            outputs = net(batch_X)
            loss = loss_function(outputs, batch_Y)
            loss.backward()

            if epoch > learning_rate_cutoff:  # this acts as a decaying learning rate for our system
                learning_rate = 0.0001
            optimizer.step()
    print(f"\nEpoch: {epoch}. Loss: {loss}")
    
    print("testing")
    correct = 0
    total = 0
    with torch.no_grad():
        for i in tqdm(range(len(test_inputs))):
            predicted_class = net(test_inputs[i].view(-1, 1040))
            real_class = test_outputs[i]
            print(predicted_class, real_class)

            if predicted_class == real_class:
                correct += 1
            total += 1
    print("\nAccuracy: ", round(correct / total, 3))

EPOCHS is 700
BATCH_SIZE is 10
test_partition is 0.1 with len(input_tensor) being 2000

KFrank · April 18, 2020, 10:18pm

Hello Brian!

Do I understand correctly?

You feed input1 – a single input tensor consisting of 1040 floats — into
your network, and you get outputA – a single float. You then feed in a
different input tensor, input2, but get the same output, outputA.

This seems very odd.

Could you post a simple, complete, runnable script, together with its
output, that demonstrates this issue?

I have in mind something like:

import torch
import ...

print (torch.__version__)

input1 = <some explicit numerical input tensor>
input2 = <another explicit numerical input tensor>
input3 = <yet another>

class Net(nn.Module):
    <your specific network that displays this issue>

net = Net()

output1 = net (input1)
print ('input1 = ...')
print (input1)
print ('output1 = ...')
print (output1)

output2 = net (input2)
print ('input2 = ...')
print (input2)
print ('output2 = ...')
print (output2)

output3 = net (input3)
print ('input3 = ...')
print (input3)
print ('output3 = ...')
print (output3)

print ('output2 - output1 =', output2 - output1)
print ('output3 - output1 =', output3 - output1)

The idea is that others should be able to run the script you post, and
by doing so, reproduce the output you post.

Best.

K. Frank

ApeelingPotato · April 19, 2020, 8:18pm

The error is on my end, I was running the NN as a classification network as opposed to a regression style. I followed an example for predicting house prices on this link and have been able to find what I was looking for. Thank you so much for your time.