Seeing inconsistent behaviors with register_forward_hook with pretrained networks

I have been looking at adding layer output visualizations to my models and have been working through doing this on pretrained networks modeled after some approaches i found online.

That said, i found that some of the pretrained Conv2D layers are outputting an activation of 0 when running a random image through and while expermenting with why, i found that i would get inconsistent outputs depending on which layers I activated.

class HookedLayerRunner():
    def __init__(self, model, selected_layer):
        self.model = model
        self.selected_layer = selected_layer
        self.layer_output = None

    def hook_layer(self):
        def hook_function(module, grad_in, grad_out):
            self.layer_output = grad_out[0]
        self.model[self.selected_layer].register_forward_hook(hook_function)

    def process(self, img):
        self.hook_layer()
        processed_image = process_image(img, device)
        self.model(processed_image)

And the code exercising the apparent bug.

filt = 10
model = models.vgg16(pretrained=True)
img = np.uint8(np.random.uniform(150, 180, (100, 100, 3)))/255.0 - 0.5
act = HookedLayerRunner(model.features, 0)
act.process(img)
print(act.layer_output[filt].mean())

t = process_image(img, device)
conv = model.features[0]
res = conv(t)
print(act.layer_output[filt].mean())

conv = nn.Sequential(model.features[0], model.features[1])
res = conv(t)
print(act.layer_output[filt].mean())

and the results, showing inconsistency depending on whether the model runs layer one in a nn.Sequential layer (as is done in the vgg16 pretrained model) or the conv2d layer is run as a singleton.

tensor(0.7511, grad_fn=<MeanBackward0>)
tensor(0.7456, grad_fn=<MeanBackward0>)
tensor(0.7511, grad_fn=<MeanBackward0>)

Any thoughts on what’s up in this? Am i just using the hook incorrectly, or is there a behavior to hooking the forward output which is dependent on the subsequent layers (a Relu in this case)?

Tried with different layer combinations and found some interesting differences. The issue seems to arise only when the following layer in the sequential is a ReLU. Output from different models being run:

Whole Model 0.7501
Sequential(Conv2D) 0.7446
Sequential(Conv2D, ReLU) 0.7501
Manual sequential1 (Conv2D, ReLU) 0.7446
Manual sequential2 (Conv2D, ReLU) 0.7501
Sequential(Conv2D, Conv2D, ReLU) 0.7446

I can’t figure out how putting a Relu after the convolution would change the output of the convolution from the hook. Even manually running the sequence of layers seems to exercise the issue…