NaN in Output and Error

Mino · October 23, 2018, 7:20pm

Hello,

i am a Newbie in PyTorch and AI and make this for privacy.

My code have to take X numbers (floats) from a list and give me back the X+1 number (float) but all what i become back is:

for Output-tensor

tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       device='cuda:0', grad_fn=<ThAddBackward>)

and for loss:

tensor(nan, device='cuda:0', grad_fn=<MseLossBackward>)

i dont know what this is

Here is my Code, thank you for your help:

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torch.optim as optim
import os

DatasetNumber = 1
DataAmount = 2
matrix1 = torch.Tensor(DataAmount * 5)
matrix2 = torch.Tensor(DataAmount * 5)

class TestNetz(nn.Module):

    # Netz erzeugen

    def __init__(self):

        super(TestNetz, self).__init__()
        self.lin1 = nn.Linear(DataAmount * 5, DataAmount * 5)  # Schichten (Hiddenlayer) (Funktionen die erlernt werden um vom Input auf Output zu kommen)
        self.lin2 = nn.Linear(DataAmount * 5, DataAmount * 5)

    def forward(self, x):
        x = F.log_softmax(self.lin1(x), 0)  # Aktivierungsfunktion relu
        x = self.lin2(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]
        num = 1
        for i in size:
            num *= i
            return num

    # Daten vorbereiten

    k = open('Datenset.txt', 'r')
    lines = k.readlines()

    i = 0
    while i < DataAmount:
        j = 0
        while j < 5:
            matrix1[j + (5 * i)] = float(lines[j + (5 * i) + (DatasetNumber * 5)])
            matrix2[j + (5 * i)] = float(lines[(DatasetNumber * DataAmount) + (DataAmount * 5)])
            j = j + 1
        i = i + 1



    print(matrix1)
    print(matrix2)


netz = TestNetz()
netz = netz.cuda()
print(netz)

if os.path.isfile('TestNetz.pt'):
    netz = torch.load('TestNetz.pt')


for i in range(100):
    # Input
    input = Variable(matrix1)
    input = input.cuda()

    out = netz(input)

    print(out)

    # Ziel
    target = Variable(matrix2)
    target = target.cuda()
    criterion = nn.MSELoss()  # Fehlerberechnung
    loss = criterion(out, target)
    #print(loss)

    netz.zero_grad()
    loss.backward()
    optimizer = optim.SGD(netz.parameters(), lr=0.01)  # Optimizer (SGD) mit Lernrate
    optimizer.step()


torch.save(netz, 'TestNetz.pt')

ptrblck · October 23, 2018, 8:35pm

Could you check your input for NaN values?
Just use

print((matrix1==matrix1).all())

There are a few minor issues in your code:

torch.Tensor creates an uninitialized tensor. I would recommend to use e.g. torch.zeros instead, so that the values are zero in case you are not initializing them.
It’s uncommon so add F.log_softmax between layers. Usually you would want to use it at your last linear layer in case you have a classification use case. F.relu would be a common non-linearity between layers.
Variables are deprecated since 0.4.0. You can directly use tensors instead now.
Probably not an issue, but you are not closing the file k. A good way is to use with open('Dataset.txt', 'r') as k: so that the file will be automatically closed.
I would suggest to create the criterion and optimizer outside the for loop. It’s not that important for the criterion. In case you would use an optimizer with running estimates (e.g. Adam) you would re-initialize it in each iteration.

Mino · October 23, 2018, 11:11pm

Thank you for your Answere,

the reaction of this line:

print((matrix1==matrix1).all())

was this line:

tensor(1, dtype=torch.uint8)

All my Datas are floats here the first 100 Numbers from my List “Datenset.txt”:

Ive try to write all your proposals in my code but the problem is the same:

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torch.optim as optim
import os

DatasetNumber = 1
DataAmount = 2
matrix1 = torch.zeros(DataAmount * 5)
matrix2 = torch.zeros(DataAmount * 5)

class TestNetz(nn.Module):

    # Netz erzeugen

    def __init__(self):

        super(TestNetz, self).__init__()
        self.lin1 = nn.Linear(DataAmount * 5, DataAmount * 5)  # Schichten (Hiddenlayer) (Funktionen die erlernt werden um vom Input auf Output zu kommen)
        F.log_softmax
        self.lin2 = nn.Linear(DataAmount * 5, DataAmount * 5)
        F.log_softmax

    def forward(self, x):
        x = F.relu(self.lin1(x), 0)  # Aktivierungsfunktion relu
        F.log_softmax
        x = self.lin2(x)
        F.log_softmax
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]
        num = 1
        for i in size:
            num *= i
            return num

    # Daten vorbereiten

    k = open('Datenset.txt', 'r')
    lines = k.readlines()

    i = 0
    while i < DataAmount:
        j = 0
        while j < 5:
            matrix1[j + (5 * i)] = float(lines[j + (5 * i) + (DatasetNumber * 5)])
            matrix2[j + (5 * i)] = float(lines[(DatasetNumber * DataAmount) + (DataAmount * 5)])
            j = j + 1
        i = i + 1

    k.close()

    #print(matrix1)
    #print(matrix2)


netz = TestNetz()
netz = netz.cuda()
#print(netz)

if os.path.isfile('TestNetz.pt'):
    netz = torch.load('TestNetz.pt')


for i in range(100):
    # Input
    input = matrix1
    input = input.cuda()

    out = netz(input)

    #print(out)

    # Ziel
    target = matrix2
    target = target.cuda()





torch.save(netz, 'TestNetz.pt')

criterion = nn.MSELoss()  # Fehlerberechnung
loss = criterion(out, target)
# print(loss)

netz.zero_grad()
loss.backward()
optimizer = optim.SGD(netz.parameters(), lr=0.01)  # Optimizer (SGD) mit Lernrate
optimizer.step()

print((matrix1==matrix1).all())
print(matrix1)
print(matrix2)

edit: Ive checked all entries with print(type(entrie)) and all entries are “class float”.

RylanSchaeffer · June 19, 2019, 10:26pm

Did you find a solution?

nghiadt05 · August 1, 2019, 5:53pm

Hello, I have run into the same problem and when I nomarlized my inputs to be in the interval [-1,1], all hidden and output values do not have NaN values anymore. I hope you may have another option to try.