(CrossEntropyLoss)Loss becomes nan after several iteration

Janine · March 17, 2020, 3:10pm

Hi all,

I am a newbie to pytorch and am trying to build a simple claasifier by my own. I am trying to train a tensor classifier with 4 classes, the inputs are one dimensional tensors with a length of 1000.

This is the architecture of my neural network, I have used BatchNorm layer:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv1d(1, 6, 5)
        self.bn1 = nn.BatchNorm1d(6)
        self.conv2 = nn.Conv1d(6, 16, 1)
        self.bn2 = nn.BatchNorm1d(16)
        self.fc1 = nn.Linear(16 * 996, 120)
        self.bn3 = nn.BatchNorm1d(120)
        self.fc2 = nn.Linear(120, 84)
        self.bn4 = nn.BatchNorm1d(84)
        self.fc3 = nn.Linear(84, 4)

    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = torch.flatten(x, 1)
        x = F.relu(self.bn3(self.fc1(x)))
        x = F.relu(self.bn4(self.fc2(x)))
        x = F.relu(self.fc3(x))
        return x

A reference of my dataset format:

for i, data in enumerate(data_loader, 0):
    input, label = data
    input, label = input.unsqueeze(1).float().to(device), label.to(device)
    print(input.shape)
    print(label)
    break

torch.Size([64, 1, 1000])
tensor([3, 0, 2, 1, 0, 1, 0, 1, 2, 0, 2, 2, 1, 1, 0, 3, 0, 0, 3, 1, 0, 2, 1, 1,
        1, 2, 2, 2, 0, 1, 0, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2, 3, 2, 0, 0, 3,
        0, 0, 3, 2, 1, 1, 0, 2, 0, 1, 1, 1, 3, 1, 0, 2], device='cuda:0')

I have just used cross entropy as my loss, and I have tried different optimizors with different learnig rate, but they yielded the same issue:

net = Net()
net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=1e-10, momentum=0.9)
# optimizer = optim.Adagrad(net.parameters(), lr=1e-5)
# optimizer = optim.Adam(net.parameters(), lr=1e-5)

epochs = 100
for epoch in range(epochs):
    running_loss = 0.0
    for i, data in enumerate(data_loader, 0):
        softmax_pdf, label, index = data
        softmax_pdf, label = softmax_pdf.unsqueeze(1).float().to(device), label.to(device)

        optimizer.zero_grad()

        outputs = net(softmax_pdf)
        loss = criterion(outputs, label)
        print(loss.item())
        loss.backward()
        print(loss.item())
        optimizer.step()

        running_loss += loss.item()
        if i % 100 == 0:
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

The outputs look similar to this for all my attempts:

1.4231895208358765
1.4231895208358765
[1,     1] loss: 0.001
1.39090096950531
1.39090096950531
1.3952362537384033
1.3952362537384033
1.3980228900909424
1.3980228900909424
1.3929697275161743
1.3929697275161743
1.3981080055236816
1.3981080055236816
1.448412299156189
1.448412299156189
1.4033156633377075
1.4033156633377075
1.403794288635254
1.403794288635254
1.422071099281311
1.422071099281311
1.4011094570159912
1.4011094570159912
1.411553978919983
1.411553978919983
1.3885104656219482
1.3885104656219482
1.3684502840042114
1.3684502840042114
1.4065546989440918
1.4065546989440918
1.3899188041687012
1.3899188041687012
1.3902665376663208
1.3902665376663208
1.3944473266601562
1.3944473266601562
1.3757106065750122
1.3757106065750122
1.439969539642334
1.439969539642334
1.4425773620605469
1.4425773620605469
1.4459556341171265
1.4459556341171265
1.3976593017578125
1.3976593017578125
1.4449955224990845
1.4449955224990845
1.4072251319885254
1.4072251319885254
1.3999367952346802
1.3999367952346802
1.4288455247879028
1.4288455247879028
1.3832045793533325
1.3832045793533325
1.4006547927856445
1.4006547927856445
1.439096212387085
1.439096212387085
1.4116154909133911
1.4116154909133911
1.4287461042404175
1.4287461042404175
1.4069699048995972
1.4069699048995972
1.40020751953125
1.40020751953125
1.3585326671600342
1.3585326671600342
1.4218270778656006
1.4218270778656006
1.3971164226531982
1.3971164226531982
1.394694209098816
1.394694209098816
1.4159125089645386
1.4159125089645386
1.3854421377182007
1.3854421377182007
1.3808670043945312
1.3808670043945312
1.3779351711273193
1.3779351711273193
1.4056364297866821
1.4056364297866821
1.4281848669052124
1.4281848669052124
1.4305639266967773
1.4305639266967773
1.3785184621810913
1.3785184621810913
1.3812319040298462
1.3812319040298462
1.39437997341156
1.39437997341156
1.4177370071411133
1.4177370071411133
1.4220192432403564
1.4220192432403564
1.4147902727127075
1.4147902727127075
1.4216375350952148
1.4216375350952148
1.4156986474990845
1.4156986474990845
1.416447401046753
1.416447401046753
1.405503511428833
1.405503511428833
1.4071837663650513
1.4071837663650513
1.4007548093795776
1.4007548093795776
1.3982759714126587
1.3982759714126587
1.415954351425171
1.415954351425171
1.3980753421783447
1.3980753421783447
1.429835557937622
1.429835557937622
1.4012715816497803
1.4012715816497803
1.3998645544052124
1.3998645544052124
1.405841588973999
1.405841588973999
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
[2,     1] loss: nan
nan
nan
nan

I have also tried to add a small epsilon to the loss (not sure if this is a rational way to add epsilon), but the issue still exists.

Any help would be genuinely appreciated here!

Janine · March 17, 2020, 4:01pm

I have modified my training scripts as follows, adding the torch.nn.utils.clip_grad_norm_ just after loss.backward, but the issue still remains:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv1d(1, 6, 5)
        self.bn1 = nn.BatchNorm1d(6)
        self.conv2 = nn.Conv1d(6, 16, 1)
        self.bn2 = nn.BatchNorm1d(16)
        self.fc1 = nn.Linear(16 * 996, 120)
        self.bn3 = nn.BatchNorm1d(120)
        self.fc2 = nn.Linear(120, 84)
        self.bn4 = nn.BatchNorm1d(84)
        self.fc3 = nn.Linear(84, 4)

    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = torch.flatten(x, 1)
        x = F.relu(self.bn3(self.fc1(x)))
        x = F.relu(self.bn4(self.fc2(x)))
        x = F.log_softmax(self.fc3(x))
        return x

net = Net()
net.to(device)

epsilon = 1e-6
# criterion = nn.CrossEntropyLoss()
criterion = nn.NLLLoss()
# optimizer = optim.SGD(net.parameters(), lr=1e-10, momentum=0.9)
# optimizer = optim.Adagrad(net.parameters(), lr=1e-5)
optimizer = optim.Adam(net.parameters(), lr=1e-10)

epochs = 100
for epoch in range(epochs):
    running_loss = 0.0
    for i, data in enumerate(data_loader, 0):
        input, label = data
        input, label = input.unsqueeze(1).float().to(device), label.to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(input)
        outputs = outputs.add(epsilon)
        loss = criterion(outputs, label)
        print(loss.item())
        loss.backward()
        torch.nn.utils.clip_grad_norm_(net.parameters(), 0.5)
        print(loss.item())
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 100 == 0:
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

1.5973020792007446
1.5973020792007446
[1,     1] loss: 0.001
1.4559650421142578
1.4559650421142578
1.5160048007965088
1.5160048007965088
1.5282397270202637
1.5282397270202637
1.4953101873397827
1.4953101873397827
1.4367845058441162
1.4367845058441162
1.48429274559021
1.48429274559021
1.5157661437988281
1.5157661437988281
1.4474157094955444
1.4474157094955444
1.4853960275650024
1.4853960275650024
nan
nan
nan
nan
nan
nan
nan

What am I doing wrong here? Desperately need some help here.

ptrblck · March 18, 2020, 4:23am

Could you check your input values for Infs and NaNs via:

if torch.isnan(input) or torch.isinf(tensor):
    print('invalid input detected at iteration ', i)

Janine · March 18, 2020, 11:08am

Hi ptrblck@ptrblck, great thanks for your reply. I have tried to insert the code in my script, and got the runtime error:

criterion = nn.NLLLoss()
optimizer = optim.Adam(net.parameters(), lr=1e-10)

epochs = 100
for epoch in range(epochs):
    running_loss = 0.0
    for i, data in enumerate(data_loader, 0):
        input, label = data
        if torch.isnan(input) or torch.isinf(input):
            print('invalid input detected at iteration ', i)
            break
        input, label = input.unsqueeze(1).float().to(device), label.to(device)

        optimizer.zero_grad()
        outputs = net(input)
        outputs = outputs.add(epsilon)
        loss = criterion(outputs, label)

        print(loss.item())
        loss.backward()
        torch.nn.utils.clip_grad_norm_(net.parameters(), 0.5)
        print(loss.item())
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 100 == 0:
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
 in 
     18         # get the inputs; data is a list of [inputs, labels]
     19         softmax_pdf, label, index = data
---> 20         if torch.isnan(input) or torch.isinf(input):
     21             print('invalid input detected at iteration ', i)
     22             break

RuntimeError: bool value of Tensor with more than one value is ambiguous

I assume you might mean this instead:

if torch.isnan(sum(sum(input))) or torch.isinf(sum(sum(input))):
    print('invalid input detected at iteration ', i)

So I replaced the code with this one, no runtime error, but it seems like it is not the input’s problem, as I still got the same nan after a while without getting any notice of 'invalid input detected at iteration '

criterion = nn.NLLLoss()
optimizer = optim.Adam(net.parameters(), lr=1e-10)

epochs = 100
for epoch in range(epochs):
    running_loss = 0.0
    for i, data in enumerate(data_loader, 0):
        input, label = data
        if torch.isnan(sum(sum(input))) or torch.isinf(sum(sum(input))):
            print('invalid input detected at iteration ', i)
            break
        input, label = input.unsqueeze(1).float().to(device), label.to(device)

        optimizer.zero_grad()
        outputs = net(input)
        outputs = outputs.add(epsilon)
        loss = criterion(outputs, label)

        print(loss.item())
        loss.backward()
        torch.nn.utils.clip_grad_norm_(net.parameters(), 0.5)
        print(loss.item())
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 100 == 0:
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

This is what I got still:

1.4137518405914307
1.4137518405914307
[1,     1] loss: 0.001
1.4274098873138428
1.4274098873138428
1.42104172706604
1.42104172706604
1.4518089294433594
1.4518089294433594
1.4289991855621338
1.4289991855621338
1.4246625900268555
1.4246625900268555
1.3915072679519653
1.3915072679519653
1.4070930480957031
1.4070930480957031
1.490980863571167
1.490980863571167
1.5186127424240112
1.5186127424240112
1.3903168439865112
1.3903168439865112
1.455558180809021
1.455558180809021
1.4122167825698853
1.4122167825698853
1.4350025653839111
1.4350025653839111
1.4551273584365845
1.4551273584365845
1.4294865131378174
1.4294865131378174
1.4452277421951294
1.4452277421951294
1.3719713687896729
1.3719713687896729
1.4808259010314941
1.4808259010314941
1.4451615810394287
1.4451615810394287
1.4479520320892334
1.4479520320892334
1.504454255104065
1.504454255104065
1.493384838104248
1.493384838104248
1.5036394596099854
1.5036394596099854
1.4297422170639038
1.4297422170639038
1.4111098051071167
1.4111098051071167
1.3866780996322632
1.3866780996322632
1.3785628080368042
1.3785628080368042
1.379706859588623
1.379706859588623
1.4556639194488525
1.4556639194488525
1.377264142036438
1.377264142036438
1.456748127937317
1.456748127937317
1.3824539184570312
1.3824539184570312
1.4618127346038818
1.4618127346038818
1.3796257972717285
1.3796257972717285
1.4378615617752075
1.4378615617752075
1.422987699508667
1.422987699508667
1.408454179763794
1.408454179763794
1.4041879177093506
1.4041879177093506
1.4276624917984009
1.4276624917984009
1.4458825588226318
1.4458825588226318
1.5070369243621826
1.5070369243621826
1.4582675695419312
1.4582675695419312
1.4353320598602295
1.4353320598602295
1.4360344409942627
1.4360344409942627
1.4060547351837158
1.4060547351837158
1.3721834421157837
1.3721834421157837
1.4859342575073242
1.4859342575073242
1.4752308130264282
1.4752308130264282
1.5029387474060059
1.5029387474060059
1.5103423595428467
1.5103423595428467
1.4800370931625366
1.4800370931625366
1.3849263191223145
1.3849263191223145
1.436884880065918
1.436884880065918
1.417528748512268
1.417528748512268
1.492363691329956
1.492363691329956
1.4123766422271729
1.4123766422271729
1.4054125547409058
1.4054125547409058
1.4886075258255005
1.4886075258255005
1.4721190929412842
1.4721190929412842
1.4615099430084229
1.4615099430084229
1.4320951700210571
1.4320951700210571
1.4198548793792725
1.4198548793792725
1.417236089706421
1.417236089706421
1.4173119068145752
1.4173119068145752
1.4298810958862305
1.4298810958862305
1.3706982135772705
1.3706982135772705
1.4235960245132446
1.4235960245132446
1.3855185508728027
1.3855185508728027
1.4296832084655762
1.4296832084655762
1.4478527307510376
1.4478527307510376
1.3492720127105713
1.3492720127105713
1.494626522064209
1.494626522064209
nan
nan
nan
nan
nan
nan
nan

Actually I have regenerated the input data, and make sure none of them has a sum == 0. Now I am really confused what could be wrong with my model or data.

braindotai · March 18, 2020, 11:39am

Try to normalize the dataset as below so that for each example the minimum and maximum value is 0.0 and 1.0 respectively.

image = (image - image.min())/(image.max() - image.min())

Janine · March 18, 2020, 11:49am

The normalization solves the problem this time, so I assume its the extreme variance of my data that causes the problem here! Thank you so much, braindotai!

Bao_Thach · August 3, 2021, 2:20pm

Try torch.isnan(input).any()