Get nan loss with CrossEntropyLoss

Hi all.
I’m new to Pytorch. I’m trying to build my own classifier. I have a dataset with nearly 30 thousand images and 52 classes and each image has 60 * 80 size.
This is my network (I’m not sure about the number of neurons in each layer).

class my_network(nn.Module):
    
    def __init__(self, class_num, act=F.relu):
        
        super(my_network, self).__init__()
        
        self.layer1 = nn.Linear(1 * 60 * 80, 50 * 30 * 40)
        self.act1 = act 
        
        self.layer2 = nn.Linear(50 * 30 * 40, 70 * 10 * 15)
        self.act2 = act 
        
        self.layer3 = nn.Linear(70 * 10 * 15, 90 * 5 * 8)
        self.act3 = act
        
        self.layer4 = nn.Linear(90 * 5 * 8, 80)
        self.act4 = act
        
        self.layer5 = nn.Linear(80, class_num)
        
    def forward(self, x):

        x = x.view(x.size(0), -1)

        x = self.layer1(x)
        x = self.act1(x)

        x = self.layer2(x)
        x = self.act2(x)

        x = self.layer3(x)
        x = self.act3(x)

        x = self.layer4(x)
        x = self.act4(x)

        x = self.layer5(x)
        return x

I’m using Cuda for my model, CrossEntropyLoss for my criterion, and SGD for my optimizer.

model = my_network(len(classes))
model = model.to(device)

learning_rate = 0.01
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

I use the following code for training my model.

for epoch in range(num_epochs):
      train_loss = 0.
 
      for images, labels in train_loader:
          
        images = images.to(device)
        labels = labels.to(device)
      
        optimizer.zero_grad()

        outputs = model(images)

        loss = criterion(outputs, labels)

        loss.backward()
        
        optimizer.step()

        print(loss.item())
        train_loss += loss.item()
    
      average_loss = train_loss / len(train_loader)

And when I run this, I get nan in output. The loss.item() returns nan in the first epoch.

nan
nan
nan
nan
...

also, I don’t want to use normalization for my data and I want to use them in this manner.
what am I doing wrong?

What range are your inputs using at the moment?
Is the first iteration already creating the NaN outputs or after a couple of updates?
In the latter case, you could add torch.autograd.set_detect_anomaly(True) at the beginning of the script, which would point to the operation, which created the first NaN output.

1 Like