Why does limiting the output layer makes accuracy drop?

I have 5 classes and I’m using a Densenet121.

In scenario A I’m setting the ouput layer as follow:

def get_trainable(model_params):
    return (p for p in model_params if p.requires_grad)

model =models.densenet121(pretrained=True)
num_ftrs = model.classifier.in_features
model.classifier = nn.Linear(num_ftrs, 5)
  
optimizer = torch.optim.Adam(
get_trainable(model.parameters()),
    lr=0.001,
)

In scenario B, I’m leaving the default output layer:

def get_trainable(model_params):
    return (p for p in model_params if p.requires_grad)

model = models.densenet121(pretrained=True)
  
optimizer = torch.optim.Adam(
get_trainable(model.parameters()),
    lr=0.001,
)

With a fixed output, the model plateaus at 76% accuracy at 20 epochs.

With the default output, I’m a 76% acc at the 2nd epoch! and it goes up to 82%.

Can someone explain why the last layer has such an influence when everything else is the same?