What is the requires_grad state when we load a pretrained model?

Gal_Elias · January 20, 2021, 1:20am

Hello,
if i write this line of code:
model = torchvision.models.googlenet(pretrained=True)
and i train the model as it is without any modifications, what layers will be trained? what is the default requires_grad state? and as i saw, the output size (# of classes) of the pretrained model is 1000 and i trained it on 2 classes dataset without modifying the last layer. How can it be possible?

Thanks!

SumanthRH · January 20, 2021, 5:08am

Hey. You can simply run the following block of code:

for parameter in model.parameters(): 
     print(parameter.requires_grad)

To check what the default state is for the model parameters. By default, requires_grad is True and the model is in train mode. (model.training is True)

Gal_Elias · January 20, 2021, 5:54pm

Oh nice, thanks and you know maybe why i can train the model with a dataset with different number of classes even if i don’t change the last layer to fit this number of classes?

SumanthRH · January 21, 2021, 9:36am

What’s the shape of the label tensor you’re passing? And the predictions are of shape (batch_size, 1000) itself? Would be better if you can provide the code block you’re running.

Gal_Elias · January 21, 2021, 12:09pm

This is the training part:

### Set Device ###
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device used: {device}")

### Load pretrain model ###
model = torchvision.models.googlenet(pretrained=True)
model.to(device)

### Loss and Optimizer ###
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

### Train Network ###
for epoch in range(num_epochs):
    losses = []

    for batch_idx, (data, targets) in enumerate(train_loader): # data - images. targets - correct labels
        # Get data to cuda if possible
        data = data.to(device=device)
        targets = targets.to(device=device)

        # Forward
        scores = model(data)
        loss = criterion(scores, targets)
        losses.append(loss.item())

        # Backward
        optimizer.zero_grad() # Reset the gradients for each batch before the calculations so it won't use previous batch's gradients
        loss.backward()

        # Gradient descent or adam step
        optimizer.step() # Update the weights

    print(f'Loss at epoch #{epoch+1}: {sum(losses)/len(losses):.4f}')

and the dataset is the dogs_cats dataset with the labels 1 for dog and 0 for cat. so the size of the label tensor is (batch) and the predictions tensor is of shape (batch, 1000).

SumanthRH · January 21, 2021, 12:38pm

The PyTorch docs mention the following about the input arguments for nn.CrossEntropyLoss :

If you have only two classes in your dataset, the loss will still accept it because it expects all target tensor values to be in the range [0, 1000-1]. So there won’t be an error raised. What you’re essentially doing is training a model for 1000-class classification and only showing it examples from the first two classes.

Gal_Elias · January 21, 2021, 12:44pm

Oh i see… thank you very much!