Model(x) not matching head(backbone(x))

Hi, I have a model which is composed of the backbone (Resnet50) + head(linear layer).
During training, I have frozen backbone so my understanding is that model(x) = head(backbone(x)) but it is not!

Before testing, I am saving head and backbone separately and then load as-

head = torch.load('head.pt')
backbone = torch.load('backbone.pt')

model = torch.load('model.pt')

assert model(x) == head(backbone(x))

The prediction final results are correct mostly but there is a significant change in raw logit values.

EDIT:
all three- model, backbone, head were set to eval before testing and torch.no_grad was used.

this is my forward definition -

class Net(nn.Module):

  def __init__(self):
      self.backbone = timm.create_model('resnet50', num_classes=0).eval()
      self.head = nn.Linear(2048, num_classes)

  def forward(self, x):
      with torch.no_grad():
         x = self.backbone(x)
      x = self.head(x)
   return x

I guess backbone parameters are being updated somehow even when it’s in eval model and torch.no_grad is used.

Because this is failing -

timm_backbone = timm.create_model('resnet50', num_classes=0).eval()

assert model.backbone(x) == timm_backbone(x)

Directly comparing floating point values might fail due to the limited floating point precision, which would be visible if e.g. a non-deterministic algorithm is used so use torch.allclose instead.

How large is this significant change if you print the abs().max() of the difference?

Mostly they have a difference of around 0.2

Thanks for the update!
I don’t know how your models are defined and if you are missing functional API calls or if there is any other issue, but I cannot reproduce the issue using the torchvision ResNet152 by comparing the entire model execution with a per-layer one:

model = models.resnet152()
model.eval()

x = torch.randn(1, 3, 224, 224)

# entire model execution
out1 = model(x)

# per-layer execution
x = model.conv1(x)
x = model.bn1(x)
x = model.relu(x)
x = model.maxpool(x)
x = model.layer1(x)
x = model.layer2(x)
x = model.layer3(x)
x = model.layer4(x)
x = model.avgpool(x)
x = torch.flatten(x, 1)
out2 = model.fc(x)

# compare
print((out1 - out2).abs().max())
> tensor(0., grad_fn=<MaxBackward1>)

thank you for your quick response :pray:t2:.
I think the issue is coming when I train the model somehow backbone is getting updated even though it is in eval mode.

class Net(nn.Module):

  def __init__(self):
      self.backbone = timm.create_model('resnet50', num_classes=0).eval()
      self.head = nn.Linear(2048, num_classes)

  def forward(self, x):
      with torch.no_grad():
         x = self.backbone(x)
      x = self.head(x)
   return x

def train(net, dataloader):
    """model trained with crossentropy here"""
  

After training, what I want is to only save the head. I will do this with a couple of tasks and use a single pretrained frozen backbone for extracting features and use the head of each task for getting the probabilities.

I’m not sure how the training fits into this use case, since you are comparing the outputs directly in your first post. However, if you think the backbone is updated, check the .grad attributes of the parameters of model.backbone after the backward call and verify that they are set to None.

1 Like

I verified the below two things -

for p in model.backbone.parameters():
    assert p.grad==None


forzen_backbone = timm.create_model('ssl_resnet50', True, num_classes=0).eval()
# model.backbone was also frozen during training
for p, p2 in zip(model.backbone.parameters(), forzen_backbone.parameters()):
    print(torch.all(torch.isclose(p, p2)))

Prints all True

FYI, that Batch Normalisation tracking was not disabled on model.eval() or with torch.no_grad. After setting BN tracking to False the issue was gone.

1 Like