Why my predictions are all zeros during the training phase of my Neural Network?

Sunakutra76 · May 1, 2021, 12:19pm

Hello,

I am a beginner with PyTorch, and I’m working on a small project which consists in predicting some output. To do this I’m using a Neural Network that has a few layers with simple transforms.

When I try to run my code I’m faced with the following problem : every prediction of my neural network is 0 (sometimes it gives other values in the first batch when I’m lucky enough…)

Here is the code of my project :

import torch
from torch import nn
from torch.utils.data import DataLoader
from torch.utils.data import Dataset
from torchvision.transforms import Lambda, Compose

import matplotlib.pyplot as plt
import numpy as np

n = 20      #Nombre de variables de direction résumant la variance du payoff
d = 20      #Dimension de l'option

N_data_training = 5000      #Taille de la base de donnée utile à l'apprentissage du réseau de neurones
N_data_test = 2500      #Taille de la base de donnée utile au test de la précision du réseau de neurones entrainé

type_product = "basket call"    #Type de produit dérivé
S0 = 100      #Spot price
K = 100     #Strike
T = 1     #Maturité
r = 0.05      #Taux d'intérêt sans risque
sigma = 0.2     #Volatilité
# Nmc = 1000      #Nombre de trajectoires générées pour l'estimation de Monte-Carlo
learning_rate = 1e-3

device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))

def payoff_derivative(N_data, type_product, d, S0, r, sigma, T): #### pay-offs test
  payoff = np.zeros(N_data)
  Z = np.zeros((N_data, d))

  if type_product == "basket call":
    for i in range(N_data):
      Z[i, :] = S0*np.exp((r-0.5*sigma**2)*T + sigma*np.sqrt(T)*np.random.randn(d))
      payoff[i] = np.maximum((1/d)*np.sum(Z[i, :], axis=0) - K, 0)

  return (Z, payoff)

pure_data_training = payoff_derivative(N_data_training, type_product, d, S0, r, sigma, T) ##### pay-offs pour l'apprentissage du réseau de neurones
pure_data_test = payoff_derivative(N_data_test, type_product, d, S0, r, sigma, T) ##### pay-offs pour le test de précision


class Creation_Dataset(Dataset):
  def __init__(self, pure_data):
    self.Z = pure_data[0]
    self.payoff = pure_data[1]

  def __len__(self):
    return len(self.payoff)

  def __getitem__(self, idx):
    return torch.tensor(self.Z[idx],dtype=torch.float), torch.tensor(self.payoff[idx],dtype=torch.float)


dataset_training = Creation_Dataset(pure_data_training)
dataset_test = Creation_Dataset(pure_data_test)

# Creation des data loaders
batch_size = 100
train_dataloader = DataLoader(dataset_training, batch_size)
test_dataloader = DataLoader(dataset_test, batch_size)

for X, y in train_dataloader:
    print("Shape of X : ", X.shape, X.dtype)
    print("Shape of y: ", y.shape, y.dtype)
    print(X[0])
    print(y[0])
    break

class NeuralNetwork(nn.Module):
    def __init__(self): ######méthode dans laquelle on définit les couches du réseau
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten() ########## convertit en tenseur 1D
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(d, n, bias=False),
            nn.Linear(n, 6),
            nn.ReLU(),
            nn.Linear(6, 15),
            nn.ReLU(),
            nn.Linear(15, 10),
            nn.ReLU(),
            nn.Linear(10, 5),
            nn.ReLU(),
            nn.Linear(5, 1),
            nn.ReLU()
          )

    def forward(self, x):
        logits = self.linear_relu_stack(x)
        return torch.reshape(logits, (-1,))

model = NeuralNetwork().to(device)
print(model)

loss_fn = nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        print("Shape of X : ", X.shape, X.dtype)
        print("Shape of y: ", y.shape, y.dtype)
        pred = model(X)
        print(pred)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

def test(dataloader, model):
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += float((pred == y).sum()) 
    size = len(dataloader.dataset)
    test_loss /= size
    correct /= size
    print(f"Accuracy: {100*correct:.2f}%, Avg loss (sqrt(MSE)) : {np.sqrt(test_loss):.3f} \n")

epochs = 2
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model)

So, using a DataLoader that creates batches of batch_size, I train my Neural Network in the train_loop() function and I print out my predictions’ tensor, and it turns out that all my predictions are zeros. I’ve seen a few topics with people facing this kind of problems with classification problems, but I’m still trying to solve it for predicting correct outputs.

Thank you for your time

KFrank · May 1, 2021, 4:02pm

Hi Sunakutra!

I haven’t looked at all of your code, but I do see some issue, noted
in line, below:

This won’t “break” your model (or cause it to return zeroes), but
two Linears in a row – without an intervening non-linearity – are
redundant, and should be replaced by a single Linear (d, 6).

This could be your problem. Using a final ReLU layer to output your
predictions is a poor choice. If your last Linear happens to generate
negative values, ReLU will output zero and will have zero gradient
when you backpropagate, so your optimizer won’t know in what
direction to move to get you “unstuck” out of the flat (zero gradient)
part of ReLU.

At the least you could try something like LeakyReLU, but I would
recommend removing the final ReLU, and letting your MSELoss and
backpropagation “learn” to predict positive values. (Here I assume
that you a training with “target” prices that are always positive.)

Your accuracy prediction will fail here. You are testing for exact
equality between two floating-point number which will almost never
be true (except for some special cases). Since you’re predicting
continuous values (rather than classes or categories) you should
use something like your MSELoss as a figure of merit. You could
also count the number of “correct” predictions by defining “correct”
to be a prediction that is within a certain absolute amount or within
a certain percentage of the ground-truth value.

Best.

K. Frank

Sunakutra76 · May 1, 2021, 4:55pm

Hi KFrank,

Thank you four your answer, I fixed all the points you have mentioned, and I removed the last nn.ReLU() in my model, and tried to put alternatively Tanh and ReLU for the remaining activation functions. It seems that in fact the model returns the same prediction for every input of my batches during the training phase, as shown in the following test with batches of batch_size = 10 :

At least I have positive values, but I still can’t see what is the problem here, I even tried to print out the weights of the model for each linear transform, or avoid using ReLU but the problem still holds.

Thank you

cskarthik7 · May 1, 2021, 5:41pm

Also I would like to mention that I tried to run your code and your labels are floating point values whereas the labels should be some fixed values where the model is able to create a mapping function between the input and the output. @Sunakutra76

KFrank · May 1, 2021, 8:00pm

Hi Sunakutra!

(As an aside, please don’t post screenshots of textual information. It
breaks accessibility, searchability, and copy-paste.)

It works for me, at least in so far as I don’t get the same prediction for
every input.

I copied your code, made a couple of the changes you made, and I can
run it and get non-identical predictions.

Two suggestions: You might probe the bias in your final Linear
layer. If the output of the preceding layer gets “stuck” at zero, then
all of your final-layer predictions will just be the final layer’s `bias’.

Also, verify that your batch doesn’t consist (for some weird reason)
of 10 copies of the same input.

Here’s a script that shows non-identical predictions (for random input
data) and shows how to look at the model layers and the final bias:

import torch
from torch import nn
print (torch.__version__)

_ = torch.manual_seed (2021)

n = 20
d = 20
batch_size = 10

class NeuralNetwork(nn.Module):
    def __init__(self): ######methode dans laquelle on definit les couches du reseau
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten() ########## convertit en tenseur 1D
        self.linear_relu_stack = nn.Sequential(
            # nn.Linear(d, n, bias=False),
            # nn.Linear(n, 6),
            nn.Linear(d, 6),
            nn.ReLU(),
            nn.Linear(6, 15),
            nn.ReLU(),
            nn.Linear(15, 10),
            nn.ReLU(),
            nn.Linear(10, 5),
            nn.ReLU(),
            nn.Linear(5, 1),
            # nn.ReLU()
        )
    
    def forward(self, x):
        logits = self.linear_relu_stack(x)
        return torch.reshape(logits, (-1,))

model = NeuralNetwork()

batch_of_samples = torch.randn (batch_size, d)

print ("model (batch_of_samples) = ...")
print (model (batch_of_samples))

print ("model = ...")
print (model)
print ("model._modules['linear_relu_stack'][8] =", model._modules['linear_relu_stack'][8])
print ("model._modules['linear_relu_stack'][8].bias =", model._modules['linear_relu_stack'][8].bias)

And here is its output:

1.7.1
model (batch_of_samples) = ...
tensor([0.3791, 0.4269, 0.3727, 0.4019, 0.3934, 0.4013, 0.3771, 0.3855, 0.3729,
        0.3700], grad_fn=<ViewBackward>)
model = ...
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=20, out_features=6, bias=True)
    (1): ReLU()
    (2): Linear(in_features=6, out_features=15, bias=True)
    (3): ReLU()
    (4): Linear(in_features=15, out_features=10, bias=True)
    (5): ReLU()
    (6): Linear(in_features=10, out_features=5, bias=True)
    (7): ReLU()
    (8): Linear(in_features=5, out_features=1, bias=True)
  )
)
model._modules['linear_relu_stack'][8] = Linear(in_features=5, out_features=1, bias=True)
model._modules['linear_relu_stack'][8].bias = Parameter containing:
tensor([0.2732], requires_grad=True)

If the problem persists, could you post a complete, runnable script
that sets torch.manual_seed (so we can reproduce the initialization
of your model weights), builds an input tensor from explicit values (so
that when we copy-paste your script we’ll be building the same tensor),
and prints out one batch of identical predictions?

Best.

K. Frank

KFrank · May 1, 2021, 8:08pm

Hi Karthik!

I don’t think this is correct.

It appears that Sunakutra is not building a classifier (where the labels
would be fixed, integer class labels), but, rather, is performing a
“regression,” where both the predictions and target (not really labels)
are continuous variables that are supposed to be close to one another.

He is using MSELoss, which is appropriate for training continuous
predictions against continuous targets (rather than something like
CrossEntropyLoss which would be appropriate for building a classifier
that has fixed categorical labels for the target).

Best.

K. Frank