Why loss function always return zero after first epoch?

Why the loss function is always printing zero after the first epoch?

I suspect it’s because of loss = loss_fn(outputs, torch.max(labels, 1)[1]).

And if if I use loss = loss_fn(outputs, torch.max(labels, 1)[0]), I will get some values that are too high and I’m not sure if they make sense, like: 1200,800,600,500(one value for each epoc)

nepochs = 5

losses = np.zeros(nepochs)

loss_fn = nn.CrossEntropyLoss()

optimizer = optim.Adam(modell.parameters(), lr = 0.001)

for epoch in range(nepochs):

    running_loss = 0.0
    n = 0
    
    for data in train_loader:
        
        #single batch
        if(n == 1):
            break;
            
        inputs, labels = data
        
        optimizer.zero_grad()

        outputs = modell(inputs)
        
        #loss = loss_fn(outputs, labels)
        loss = loss_fn(outputs, torch.max(labels, 1)[1])
        loss.backward()
        optimizer.step()
    
        running_loss += loss.item()
        n += 1
       
    losses[epoch] = running_loss / n
    print(f"epoch: {epoch+1} loss: {losses[epoch] : .3f}")

The model is:

def __init__(self, labels=10):
    super(Classifier, self).__init__()
    self.fc = nn.Linear(3 * 64 * 64, labels)
    
def forward(self, x):
    out = x.reshape(x.size(0), -1) 
    out = self.fc (out)
    return out

the labels variable is a 64 tensor size like this
tensor([[7],[1],[ 2],[3],[ 2],[9],[9],[8],[9],[8],[ 1],[7],[9],[2],[ 5],[1],[3],[3],[8],[3],[7],[1],[7],[9],[8],[ 8],[3],[7],[ 5],[ 1],[7],[3],[2],[1],[ 3],[3],[2],[0],[3],[4],[0],[7],[1],[ 8],[4],[1],[ 5],[ 3],[4],[3],[ 4],[8],[4],[1],[ 9],[7],[3],[ 2],[ 6],[4],[ 8],[3],[ 7],[3]])

Your labels tensor seems to already contain class indices but has an additional unnecessary dimension.
The right approach would be to use labels = labels.squeeze(1) and pass it to the criterion.
Using torch.max(labels, dim=1)[0] would yield the same output.
However, torch.max(labels, dim=1)[1] would return the indices in dim1 containing the max value, which would be a tensor full of zeros which is wrong and would thus explain the zero loss as your model would only learn to predict class0.

1 Like