Conv layers not being properly frozen?

Hello,
I’m having some trouble when trying to freeze layers of a pretrained network in order to learn a new head. My forward pass code is as follows:

def forward(self, x):
        x = self.conv1(x)
        x = self.blocks(x)
        x = self.conv2(x)

        a = self.linear7(x)
        a = self.linear1(a)
        a = a.view(a.size(0), -1)

        if (self.transfer):
            x = self.conv3(x)
            x = self.conv4(x)
            x = x.view(x.size(0), -1)
            
            x1 = torch.softmax(self.fc1(x), dim=1)
            x2 = torch.softmax(self.fc2(x), dim=1)

            return a, x1, x2

        return a

What I want to do is, learn the conv3, conv4, fc1 and fc2 layers for a new task, using the features extracted using previously trained weights for the other layers.

Relevant parts of the training code are:

if args.resume:
	to_train = ['fc1.weight', 'fc1.bias', 'fc2.weight', 'fc2.bias', 'conv3.weight', 'conv3.bias', 'conv4.weight', 'conv4.bias']
	model.load_state_dict(torch.load(args.resume)['backbone_net_list'], strict=False)
	# Freeze layers
	for name, param in model.named_parameters():
		if(name not in to_train):
			param.requires_grad = False
for batch_idx, (img, labels) in enumerate(train_loader):
		
		img = img.to(model.device)
		gt1 = labels['1'].to(model.device)
		gt2 = labels['2'].to(model.device)
		#Forward parse_args
		a, x1, x2 = model(img.float())

		# # Cost Function
		cost = utils.cost_fn(gt1, x1, gt2, x2)
		# Backprop
		optimizer.zero_grad()
		cost.backward()
		# Update parameters
		optimizer.step()
		torch.save(model.state_dict(), os.path.join(args.output, 'model-{}.pth'.format(epoch)))

(I’m saving a model after just one backprop in order to investigate the updating of the weights.)
My problem is, even after just one iteration, if I load the saved model and compare it’s output for the ‘a’ variable against that of the original model, I get different results. If my reasoning is correct, I should get the same output given that I had the weights frozen, and that output never passes through the new head, right? What am I doing and/or thinking wrong here?