Model not being trained

aklagoo · September 14, 2019, 3:42am

I’m trying to train a model for image classification. However, the training just doesn’t take place.

class CNN(nn.Module):
  def __init__(self):
    super(CNN, self).__init__()
    self.conv1 = nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2)
    self.conv2 = nn.Conv2d(64, 192, kernel_size=7, padding=2)
    self.conv3 = nn.Conv2d(192, 256, kernel_size=5, padding=1)

    self.fc1 = nn.Linear(256*9*9, 4096)
    self.fc2 = nn.Linear(4096, 1024)
    self.fc3 = nn.Linear(1024, 102)

  def forward(self,x):
    x = self.conv1(x)
    x = F.max_pool2d(x, kernel_size=3, stride=2)
    x = self.conv2(x)
    x = F.max_pool2d(x, kernel_size=3, stride=2)
    x = self.conv3(x)
    x = F.max_pool2d(x, kernel_size=3, stride=2)
    
    x = x.view(x.shape[0], -1)

    x = self.fc1(x)
    x = F.relu(x, inplace=True)
    x = self.fc2(x)
    x = F.relu(x, inplace=True)
    x = self.fc3(x)
    x = F.log_softmax(x, dim=1)

    return x

model = CNN()

criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters())
model.to(device)

NUM_EPOCHS = 10

for epoch in range(NUM_EPOCHS):
  train_loss = 0
  valloss = 0
  accuracy = 0

  counter = 0
  for inputs, labels in trainloader:
    # Move to device
    outputs= labels.to(device)
    inputs = inputs.to(device)

    # Get loss
    optimizer.zero_grad()
    preds = torch.exp(model(inputs))
    loss = criterion(preds, outputs)

    # Backprop
    loss.backward()
    optimizer.step()

    train_loss += loss.item()*inputs.size(0)
  
  # Validation
  with torch.no_grad():
    for inputs, labels in validationloader:
            # Move to device
            inputs, labels = inputs.to(device), labels.to(device)
            output = model(inputs)
            valloss = criterion(output, labels)
            valloss += valloss.item()*inputs.size(0)
            output = torch.exp(output)
            
            # Calculate accuracy
            top_p, top_class = output.topk(1, dim=1)
            equals = top_class == labels.view(*top_class.shape)
            accuracy += torch.mean(equals.type(torch.FloatTensor)).item()
    
    # Calculate loss
    train_loss = train_loss/len(trainloader.dataset)
    valloss = valloss/len(validationloader.dataset)
    
    # Print info
    print('Accuracy: ', accuracy/len(validationloader))
    print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(epoch, train_loss, valloss))

mailcorahul · September 14, 2019, 6:14am

What do you mean by training doesn’t take place?
Can you be more elaborate? Anything to do with the training/validation loss?

aklagoo · September 14, 2019, 6:20am

The loss and the accuracy remains the same. The loss, in fact, is negative. The accuracy is close to 1.15%.

mailcorahul · September 14, 2019, 7:19am

May I know the reason why you do torch.exp() after forward pass? Also what criterion are you using?

aklagoo · September 14, 2019, 7:20am

NLLLoss with log_softmax inside

mailcorahul · September 14, 2019, 7:27am

I have not used NLL loss before, but it says in the doc that a logsoftmax layer is necessary at the last layer of the network inorder to use NLL loss. Do you have it?
And why torch.exp() at the end?

aklagoo · September 14, 2019, 9:07am

I’ve used a log_softmax layer inside my network. And from what I’ve seen from some tutorials, torch.exp is necessary to convert the log output to the probability

ptrblck · September 14, 2019, 10:53am

As @mailcorahul said, nn.NLLLoss expects the log probabilities, so you shouldn’t apply torch.exp on your output. Instead pass the F.log_softmax outputs directly to your criterion.

If you need to see the softmax probabilities, you can of course use torch.exp. Just don’t pass them to nn.NLLLoss.

mailcorahul · September 14, 2019, 11:07am

@aklagoo let us know if it works.

aklagoo · September 14, 2019, 5:03pm

I changed the section and passed the predictions without torch.exp(). The model still doesn’t train, although I was wrong before. Even after multiple epochs, the loss and the accuracy hasn’t changed.

mailcorahul · September 14, 2019, 5:07pm

can you post the complete code here(with network and training)?

aklagoo · September 14, 2019, 5:24pm

I’ve updated the code.

mailcorahul · September 14, 2019, 5:43pm

can you try defining optim after moving model to gpu?

model.to(device)
optimizer = optim.Adam(model.parameters())

aklagoo · September 14, 2019, 5:47pm

The result is still the same. I checked and found out that the model weights are not being updated. I’ve modified the model and added a few ReLU activation layers. The weights are finally changing, although I’m not sure why that would be an issue.

mailcorahul · September 14, 2019, 6:08pm

are you saying the weights were not updated before(optimizer and then moving to gpu), but are changing after swapping the statements?

aklagoo · September 14, 2019, 6:25pm

No. Switching the statements did not work. I modified my model and now the weights are changing. The accuracy is still 3% but the loss is changing.

Abhilash_Srivastava · September 21, 2019, 12:57am

Are you sure you’re not experiencing exploding gradient/vanishing gradient problem?