Modifying output of middle layers in PyTorch

Elahe · January 11, 2024, 2:35pm

Hello, I am a new PyTorch user and trying to modify the activation function of the last layer before FC layers in resnet18. I am using forward hook but I am not sure if it is the right method. I want to change the output of the layer using an external variable (I send it to the hook during the training phase) and continue training using the new output. However, it seems that the modified output does not affect the training phase even though it reaches the hook function.
Any opinion would be appreciated

ptrblck · January 11, 2024, 4:47pm

Your idea seems to work for my minimal code snippet and you can see that the output as well as the gradients are affected by my scaling:

model = models.resnet18()

x = torch.randn(1, 3, 224, 224)
out = model(x)
print(out.sum())
# tensor(-4.1408, grad_fn=<SumBackward0>)

model.avgpool.register_forward_hook(lambda module, args, output: output * 100000.)
out = model(x)
print(out.sum())
# tensor(-509981.3750, grad_fn=<SumBackward0>)

out.mean().backward()
print([p.grad.abs().sum() for p in model.parameters()])
# [tensor(4310105.), tensor(6621.5508), tensor(1928.0602), tensor(1945747.), tensor(3376.6890), tensor(2727.0542), tensor(1617620.7500), tensor(4076.4431), tensor(1723.3049), tensor(1360212.2500), tensor(2601.7117), tensor(2465.5730), tensor(1140538.5000), tensor(2333.1758), tensor(1146.3984), tensor(2520872.), tensor(3527.0349), tensor(3388.9497), tensor(2959968.7500), tensor(3270.3921), tensor(2517.4749), tensor(228854.8906), tensor(3325.2280), tensor(2517.4749), tensor(2747826.2500), tensor(2659.8757), tensor(2339.7527), tensor(2284569.), tensor(2155.9722), tensor(1756.6826), tensor(4316460.), tensor(3172.9500), tensor(2899.5034), tensor(5036753.), tensor(3064.1797), tensor(2210.6892), tensor(402473.4688), tensor(3313.6052), tensor(2210.6892), tensor(4720320.), tensor(2539.8877), tensor(2057.7183), tensor(3956438.5000), tensor(1998.9598), tensor(1444.0149), tensor(7319756.5000), tensor(2764.3674), tensor(2484.2786), tensor(8468459.), tensor(9157.4580), tensor(14000.2422), tensor(655827.5625), tensor(9085.8076), tensor(14000.2422), tensor(6723392.5000), tensor(1799.7094), tensor(1601.7231), tensor(4804520.), tensor(10960.4609), tensor(22938.2988), tensor(42316260.), tensor(1.0000)]

Elahe · January 11, 2024, 8:01pm

Thank you so much for replying, here is my code, even when I multiply the new output (modified_output) by zero the model gets overfit after processing some batches in the first epoch.
Would you please take a look at my code to see if I am using it correctly please?


def forward_hook(module, input, output, additional_variable):
     modified_output = output *  additional_variable # 
    print(f"Modified output shape: {modified_output.shape}")

# rest of the code

# Load a pre-trained ResNet model
resnet_model = models.resnet18(pretrained=True)
num_fc = resnet_model.fc.in_features
resnet_model.fc = nn.Linear(num_fc, 2)

# Find the target activation function layer
target_layer_name = 'layer4.1.relu'  
target_activation_function = None


for name, module in resnet_model.named_modules():
    if name == target_layer_name:
        target_activation_function = module
        break

if target_activation_function is not None:
    # Attach the forward hook to the target activation function
    hook = target_activation_function.register_forward_hook(
        lambda module, input, output: forward_hook(module, input, output, additional_variable)
    )
# rest of the code
# Pass the input through the model (during the training phase), inputs are the images
            outputs = resnet_model(inputs)