Use transfer learning on CNN to train on 2 different datasets (CV)

I am trying to replicate a paper that trains a CNN on two different datasets and uses transfer learning for this. I am writing to check whether my code is right.
Here is the architecture used to train on the first dataset. I am using transfer learning to get the feature extraction of googlenet and train the model on my custom dataset.

class cvv_train(nn.Module):
  def __init__(self, num_classes):
    super(cvv_train, self).__init__()
    # import googlenet
    googletnet = torch.hub.load('pytorch/vision:v0.10.0', 'googlenet', pretrained=True)
    
    # feature extraction
    self.feature_extractor = nn.Sequential(*list(googletnet.children())[:-2])# pool5-drop_7x7_s1

    self.final = nn.Sequential(
            nn.Linear(googletnet.fc.in_features, num_classes), nn.BatchNorm1d(num_classes, momentum=0.01)
        )
  def forward(self, x):
    #no gradient/backpropagation for learning
    with torch.no_grad():
        x = self.feature_extractor(x) #assume: 1024x1x1
    x = x.view(x.size(0), -1) #assume: 1024x1
    return self.final(x) #assume: 512x1

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = cvv_train(312)
model = model.to(device)
model.train()

PATH = 'path to trained model'

model.load_state_dict(torch.load(PATH))

Now, I will use the model trained (again take the feature extraction layer and change the last layer).

class cvv_train2(nn.Module):
  def __init__(self, num_classes, model):
    super(cvv_train2, self).__init__()
    # import googlenet
    self.model = model

    # feature extraction
    self.feature_extractor = nn.Sequential(*list(self.model.children())[0])# pool5-drop_7x7_s1

    self.final = nn.Sequential(
            nn.Linear(self.model.final[0].in_features, num_classes), nn.BatchNorm1d(num_classes, momentum=0.01)
        )
  def forward(self, x):
    #no gradient/backpropagation for learning
    with torch.no_grad():
        x = self.feature_extractor(x) #assume: 1024x1x1
    x = x.view(x.size(0), -1) #assume: 1024x1
    return self.final(x) #assume: 512x1
model2 = cvv_train2(num_classes, model)

How can I check which weights do not change?
Also does this seem correct?
The paper that I am trying to replicate is Facial Expression Recognition in Videos: An CNN-LSTM based Model for Video Classification and page 2 explains what is needs to be done for the training of CNN

You could deepcopy the state_dict before a single update and compare the values to the new state_dict after the parameters were updated to check which parameters were updated.

I haven’t checked the architecture, but assume you’ve verified that no e.g. shape mismatch etc. is raised.
It looks generally alright besides the last nn.BatchNorm1d layer as the output layer, which seems to be uncommon but it doesn’t mean it’s necessarily wrong.

Thank you for your response.
Another question on the same project that I had is do you have any advise on training on a large dataset with many classes. I am working on a part of the VGG-face dataset and it takes forever for a small decrease in the loss. I would expect that using the googlenet to be faster.

Thank you in advance!