Model not being trained

I’m trying to train a model for image classification. However, the training just doesn’t take place.

class CNN(nn.Module):
  def __init__(self):
    super(CNN, self).__init__()
    self.conv1 = nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2)
    self.conv2 = nn.Conv2d(64, 192, kernel_size=7, padding=2)
    self.conv3 = nn.Conv2d(192, 256, kernel_size=5, padding=1)

    self.fc1 = nn.Linear(256*9*9, 4096)
    self.fc2 = nn.Linear(4096, 1024)
    self.fc3 = nn.Linear(1024, 102)

  def forward(self,x):
    x = self.conv1(x)
    x = F.max_pool2d(x, kernel_size=3, stride=2)
    x = self.conv2(x)
    x = F.max_pool2d(x, kernel_size=3, stride=2)
    x = self.conv3(x)
    x = F.max_pool2d(x, kernel_size=3, stride=2)
    
    x = x.view(x.shape[0], -1)

    x = self.fc1(x)
    x = F.relu(x, inplace=True)
    x = self.fc2(x)
    x = F.relu(x, inplace=True)
    x = self.fc3(x)
    x = F.log_softmax(x, dim=1)

    return x

model = CNN()

criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters())
model.to(device)

NUM_EPOCHS = 10

for epoch in range(NUM_EPOCHS):
  train_loss = 0
  valloss = 0
  accuracy = 0

  counter = 0
  for inputs, labels in trainloader:
    # Move to device
    outputs= labels.to(device)
    inputs = inputs.to(device)

    # Get loss
    optimizer.zero_grad()
    preds = torch.exp(model(inputs))
    loss = criterion(preds, outputs)

    # Backprop
    loss.backward()
    optimizer.step()

    train_loss += loss.item()*inputs.size(0)
  
  # Validation
  with torch.no_grad():
    for inputs, labels in validationloader:
            # Move to device
            inputs, labels = inputs.to(device), labels.to(device)
            output = model(inputs)
            valloss = criterion(output, labels)
            valloss += valloss.item()*inputs.size(0)
            output = torch.exp(output)
            
            # Calculate accuracy
            top_p, top_class = output.topk(1, dim=1)
            equals = top_class == labels.view(*top_class.shape)
            accuracy += torch.mean(equals.type(torch.FloatTensor)).item()
    
    # Calculate loss
    train_loss = train_loss/len(trainloader.dataset)
    valloss = valloss/len(validationloader.dataset)
    
    # Print info
    print('Accuracy: ', accuracy/len(validationloader))
    print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(epoch, train_loss, valloss))
1 Like

What do you mean by training doesn’t take place?
Can you be more elaborate? Anything to do with the training/validation loss?

The loss and the accuracy remains the same. The loss, in fact, is negative. The accuracy is close to 1.15%.

May I know the reason why you do torch.exp() after forward pass? Also what criterion are you using?

NLLLoss with log_softmax inside

I have not used NLL loss before, but it says in the doc that a logsoftmax layer is necessary at the last layer of the network inorder to use NLL loss. Do you have it?
And why torch.exp() at the end?

I’ve used a log_softmax layer inside my network. And from what I’ve seen from some tutorials, torch.exp is necessary to convert the log output to the probability

As @mailcorahul said, nn.NLLLoss expects the log probabilities, so you shouldn’t apply torch.exp on your output. Instead pass the F.log_softmax outputs directly to your criterion.

If you need to see the softmax probabilities, you can of course use torch.exp. Just don’t pass them to nn.NLLLoss.

@aklagoo let us know if it works.

I changed the section and passed the predictions without torch.exp(). The model still doesn’t train, although I was wrong before. Even after multiple epochs, the loss and the accuracy hasn’t changed.

can you post the complete code here(with network and training)?

2 Likes

I’ve updated the code.

can you try defining optim after moving model to gpu?

model.to(device)
optimizer = optim.Adam(model.parameters())

The result is still the same. I checked and found out that the model weights are not being updated. I’ve modified the model and added a few ReLU activation layers. The weights are finally changing, although I’m not sure why that would be an issue.

are you saying the weights were not updated before(optimizer and then moving to gpu), but are changing after swapping the statements?

No. Switching the statements did not work. I modified my model and now the weights are changing. The accuracy is still 3% but the loss is changing.

Are you sure you’re not experiencing exploding gradient/vanishing gradient problem?