What is the requires_grad state when we load a pretrained model?

Hello,
if i write this line of code:
model = torchvision.models.googlenet(pretrained=True)
and i train the model as it is without any modifications, what layers will be trained? what is the default requires_grad state? and as i saw, the output size (# of classes) of the pretrained model is 1000 and i trained it on 2 classes dataset without modifying the last layer. How can it be possible?

Thanks!

1 Like

Hey. You can simply run the following block of code:

for parameter in model.parameters(): 
     print(parameter.requires_grad)

To check what the default state is for the model parameters. By default, requires_grad is True and the model is in train mode. (model.training is True)

1 Like

Oh nice, thanks :slight_smile: and you know maybe why i can train the model with a dataset with different number of classes even if i don’t change the last layer to fit this number of classes?

What’s the shape of the label tensor you’re passing? And the predictions are of shape (batch_size, 1000) itself? Would be better if you can provide the code block you’re running.

This is the training part:

### Set Device ###
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device used: {device}")

### Load pretrain model ###
model = torchvision.models.googlenet(pretrained=True)
model.to(device)

### Loss and Optimizer ###
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

### Train Network ###
for epoch in range(num_epochs):
    losses = []

    for batch_idx, (data, targets) in enumerate(train_loader): # data - images. targets - correct labels
        # Get data to cuda if possible
        data = data.to(device=device)
        targets = targets.to(device=device)

        # Forward
        scores = model(data)
        loss = criterion(scores, targets)
        losses.append(loss.item())

        # Backward
        optimizer.zero_grad() # Reset the gradients for each batch before the calculations so it won't use previous batch's gradients
        loss.backward()

        # Gradient descent or adam step
        optimizer.step() # Update the weights

    print(f'Loss at epoch #{epoch+1}: {sum(losses)/len(losses):.4f}')

and the dataset is the dogs_cats dataset with the labels 1 for dog and 0 for cat. so the size of the label tensor is (batch) and the predictions tensor is of shape (batch, 1000).

The PyTorch docs mention the following about the input arguments for nn.CrossEntropyLoss :

If you have only two classes in your dataset, the loss will still accept it because it expects all target tensor values to be in the range [0, 1000-1]. So there won’t be an error raised. What you’re essentially doing is training a model for 1000-class classification and only showing it examples from the first two classes.

Oh i see… thank you very much!