How can I extract intermediate layer output from loaded CNN model?

After training my own CNN model and load it, I want to extract the features of the middle layer. Here’s my CNN model and codes.

Convoultional Nerual Net

class net(nn.Module):
    def __init__(self):
        super(net, self).__init__()
        self.conv1_1 = nn.Conv2d(in_channels = 3, out_channels = 16, kernel_size = 11, stride = 3)
        self.bn1 = nn.BatchNorm2d(16)
        self.conv2_1 = nn.Conv2d(in_channels = 16, out_channels = 32, kernel_size = 7, stride = 2)
        self.bn2 = nn.BatchNorm2d(32)
        self.pool1 = nn.MaxPool2d(2, 2)
        
        self.conv3_1 = nn.Conv2d(in_channels = 32, out_channels = 64, kernel_size = 5, stride = 1)
        self.conv3_2 = nn.Conv2d(in_channels = 64, out_channels = 64, kernel_size = 5, stride = 1)
        self.bn3 = nn.BatchNorm2d(64)
        self.pool2 = nn.MaxPool2d(2, 2)
        
        self.conv4_1 = nn.Conv2d(in_channels = 64, out_channels = 128, kernel_size = 3, stride = 1)
        self.conv4_2 = nn.Conv2d(in_channels = 128, out_channels = 128, kernel_size = 3, stride = 1)
        self.conv4_3 = nn.Conv2d(in_channels = 128, out_channels = 128, kernel_size = 3, stride = 1)
        self.bn4 = nn.BatchNorm2d(128)
        self.pool3 = nn.MaxPool2d(2, 2)
              
        self.fc1 = nn.Linear(128*5*5, 1000)
        self.fc2 = nn.Linear(1000, 1000)
        self.fc3 = nn.Linear(1000, 128)
        self.out = nn.Linear(128, 1)
        
        # activation, batch normalization
        self.prelu = nn.PReLU()
        self.bn0 = nn.BatchNorm1d(1000)
        
        # dropout
        self.dropout2d = nn.Dropout2d(0.25)
        self.dropout1d = nn.Dropout(0.5)
        
    def forward(self, x):
        x = self.bn1(F.relu(self.conv1_1(x)))
        x = self.bn2(F.relu(self.conv2_1(x)))
        x = self.dropout2d(self.pool1(x))
        
        x = self.bn3(F.relu(self.conv3_1(x)))
        x = self.bn3(F.relu(self.conv3_2(x)))
        x = self.dropout2d(self.pool2(x))
        
        x = self.bn4(self.prelu(self.conv4_1(x)))
        x = self.bn4(self.prelu(self.conv4_2(x)))
        x = self.bn4(self.prelu(self.conv4_3(x)))       
        x = self.dropout2d(self.pool3(x))
        x = x.view(-1, 128*5*5)
        
        x = self.dropout1d(F.relu(self.fc1(x)))
        x = self.dropout1d(F.relu(self.fc2(x)))
        x = F.relu(self.fc3(x))
        out = self.out(x)
        
        return out
    
net = net()

Load Model

model = net.to('cuda:0')
num_model = 10

model.load_state_dict(torch.load('C:/Users/KIMSUNGHUN/Documents/TCGA-GBM/model4/params_{}.pt'.format(num_model)))
model.eval()

After loading a specific pre-trained model I saved, I want to freeze the parameters of the model and extract only the features of the last fully connected layer. the output is to be = (n, 128)

Is there any good way to use in models created and loaded through “nn.Module”?

3 Likes

You could register a forward hook on model.fc3 as described here or alternatively manipulate the forward method and return the output activation of self.fc3 as well as from self.out.

13 Likes

Thanks for the answer. I’ve tried the first method you said, and it worked!
By the way, I’m curious, is there a possibility that a negative value will come out even after passing through the relu function?

Extractor

activation = {}
def get_activation(name):
    def hook(model, input, output):
        activation[name] = output.detach()
    return hook
model.fc3.register_forward_hook(get_activation('fc3'))
output = model(x)
activation['fc3']

tensor([[ -6.8134, -9.0775, -5.5216, 7.8066, -8.0170, -6.4616, -7.9551,
-10.1412, -7.9658, -0.5005, -8.6268, -3.1905, -9.5662, 5.7462,
-7.9711, -11.6978, -9.5338, -3.5561, -8.5417, -9.5329, -6.0198,
-6.1909, -7.8344, -6.3041, -7.9333, -9.6809, -4.8881, -8.7193,
-6.9742, -7.7307, -6.6041, -12.6779, -6.9386, -8.6367, -9.9166,
-6.2529, -9.1278, -7.4166, -6.9962, -10.5488, -9.0119, 5.0765,
-11.8163, -12.2532, -3.2899, -10.9596, -5.1833, -7.9950, -8.2713,
-4.9636, -1.9635, -9.4730, -8.7698, -9.3743, -8.4019, -6.0198,
-11.1968, -9.3805, -8.0975, -7.8172, -6.3183, -10.1746, -8.0311,
-8.4333, -9.8825, -12.8623, -8.6050, -5.4799, -10.6940, -8.5273,
-8.9511, -7.6086, -7.7462, -7.8799, -10.8322, -2.3653, -2.5826,
-2.2110, -8.2639, -10.9057, -8.5005, -5.1672, -11.6788, -8.2162,
-10.8182, -11.0485, -7.9058, -10.0653, -9.7719, -10.3265, -6.8779,
-8.0053, -7.7753, -8.8138, 21.4658, -9.3669, -9.3695, -8.1936,
-9.8124, -10.6773, -8.2539, -10.2004, -8.5175, -9.4678, -8.3933,
-1.1410, -8.9344, -5.1007, -8.7830, -8.8065, -7.1228, -9.0213,
-3.4798, -8.1264, -7.5717, -0.0733, -13.1373, -10.7214, -6.4440,
-10.8441, -10.6284, -7.5801, -9.7635, -7.8054, -8.3500, -9.1799,
-10.1877, -9.4926]], device=‘cuda:0’)

8 Likes

Good to hear it’s working!
model.fc3.register_forward_hook(get_activation('fc3')) will give you the output of model.fc3 not the F.relu in your forward method, so it’s plausible that some values are negative.

3 Likes

Then I need to take the ‘relu’ function on the current value to get the value I want.
Many of the values ​​will disappear :pleading_face:

That might be the case, but as long as your model is training well, it shouldn’t be a problem.
Or what is your use case that you need a lot of positive values?

3 Likes

Hello,
One thing I want to point out, if we use torchvision default models (such as vgg16)
there is a Relu inplace =True
so even if I am trying to access the convolution layer (by layer Num, such as layer17, layer 28,) before Relu,
this hook method seems to return results after its corresponding Relu (i.e., negative values becomes zero)
for example, layer17 will actually return results of layer18 (layer17+ReLU)

1 Like

Hello @leopardyao , @ptrblck, if this is the case how can we return the output without the Relu operation applied to a conv layer after batch norm?

Consider this piece of architecture in Resnet50 :

(2): Bottleneck(
        (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )

in here, I need to get the output after normalizing the input using batch norm but before the relu operation, so I thought I could do something like this :

model = Resnet50() 
model.load_state_dict(torch.load('path_to_model.bin'))
model.to(device) 

# now using function hook, I was able to get the output after the conv3 layer like this :
model.model.layer4[1].conv3.register_forward_hook(get_activation("some_key_name") 

now using the output vector which is stored in the activation dict, I applied the batch norm operation on it like :

model.model.layer4[1].bn3(activation['some_key_name'])

But I see that some of the outputs compared with the relu outputs excluding 0 due to negatives from the batch norm layer vary.

ex :
This is when I do BN(CN)

A = 
[-0.2502, -0.4636, -0.4095, -0.3121],
          [-0.5342, -0.4120, -0.0899, -0.0620],
          [-0.1114, -0.1497, -0.1434,  0.0654],
          [-0.1596,  0.2669,  0.4669,  0.1378]

This is gotten from directly hooking the bn3 layer :
i.e :

model.model.layer4[1].bn3.register_forward_hook(get_activation("some_key_name_2") 
B = 
[0.        , 0.        , 0.        , 0.        ],
[0.        , 0.        , 0.        , 0.89255244],
[0.        , 0.        , 0.03623176, 0.34344062],
[0.        , 0.2668974 , 0.5831858 , 0.13784815]]

it will be usually B = F.relu(A), However some of the items does not match, how can I rectify this ?

JFYI,

  1. I have just considered a smaller section in A and B and does not represent the whole matrix
  2. model.model as I am encapsulating the resnet torchvision model in a class.

I’m unsure, if this behavior was changed or if I’m missing something, but the forward hook on the previous layer seems to return the expected values without the applied (inplace) relu:

class MyModel(nn.Module):
    def __init__(self, inplace=False):
        super(MyModel, self).__init__()
        self.fc = nn.Linear(10, 10)
        self.relu = nn.ReLU(inplace=inplace)
        
    def forward(self, x):
        x = self.relu(self.fc(x))
        return x


x = torch.randn(1, 10)

# out of place
model = MyModel()
model.fc.register_forward_hook(lambda m, input, output: print(output))
out = model(x)

# inplace
model = MyModel(inplace=True)
model.fc.register_forward_hook(lambda m, input, output: print(output))
out = model(x)

So when I ran the code you have pasted @ptrblck , the outputs were different, are you saying that there might be some issue?

Output:

tensor([[ 0.5963,  0.1013, -1.2797, -0.4520, -0.7417, -0.5781, -0.0526,  0.9478,
         -1.3636, -0.2064]], grad_fn=<AddmmBackward>)
tensor([[ 0.4647, -0.3197, -0.1019,  0.2178, -0.7557,  0.3196,  0.7327,  0.0902,
         -0.2639,  0.0036]], grad_fn=<AddmmBackward>)

But there seems to be some issue here, as the self.relu has not been applied for the first output (lets call that A and second one which has an inplace set to true as B ) why is that I can’t get back B if I do relu on A i.e,

B = self.relu(A)

Am I missing anything or is there any bug in the codebase?

The code illustrates that the forward hook registered in model.fc returns the “pre-relu” activation, since negative values are shown. Since my code snippet creates two different modules, the parameters will also be randomly initialized. If you want to get the same output, you could load the state_dict of the first model into the second one:

x = torch.randn(1, 10)

# out of place
model = MyModel()
sd = model.state_dict()
model.fc.register_forward_hook(lambda m, input, output: print(output))
out = model(x)

# inplace
model = MyModel(inplace=True)
model.load_state_dict(sd)
model.fc.register_forward_hook(lambda m, input, output: print(output))
out = model(x)

hi @ptrblck , I think the issue is with the output.detach() I am unsure how it is causing an issue because I did these experiments :

  1. Using a dictionary to store the activations :
activation = {}
def get_activation(name):
    def hook(model, input, output):
        activation[name] = output.detach()
    return hook

When I use the above method, I was able to see a lot of zeroes in the activations, which means that the output is an operation of Relu activation.

  1. Using Lambda function :
model.fc.register_forward_hook(lambda m, input, output: print(output))

When I try the same with the method you have mentioned recently of using a lambda function, I am clearly seeing the negative values in the output.

I believe there is some issue with the output.detach() ? Can you weigh in your thoughts?

If you want to store the output activation and make sure it’s not changed by the following inplace operation, you could clone() it before adding it to the dict.

Hi @ptrblck, Thanks, I was able to get the proper activations, however, I am not completely convinced about the use of detach or clone.

Please reaffirm if my assumption is correct: detach() is used to remove the hook when the forward_hook() is done for an intermediate layer? I did see that when I iterated to get the next layer activation function, I also got the output from the first hook when detach() was not done.

Secondly, clone() is used to just clone the entire model as is. I tried with both output and output. detach() in the hook function and both returned after applying in-place operation.

activation = {}
def get_activation(name):
    def hook(model, input, output):
        activation[name] = output.clone().detach()
    return hook

This is the function that I used and If my understanding is correct this should do the trick.

No, detach() is not used for hooks, but is a tensor operation, which detaches the tensor from the computation graph. I.e. Autograd will stop during the backpropagation at this detached point.

No, clone() is also a tensor operation and will create a new tensor with the same data. The returned tensor will not be changed by inplace operations.

# without clone
x = torch.zeros(10)
y = x
x[0] = 1.
print(x)
> tensor([1., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
print(y)
> tensor([1., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

# with clone
x = torch.zeros(10)
y = x.clone()
x[0] = 1.
print(x)
> > tensor([1., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
print(y)
> tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
4 Likes

Also is it possible to load in the pre-trained weights given by torchvision in the above modified ResNet model. Share. Share a link to this question.

Thanks for the information

@ptrblck Can we move a little further from this topic, say what if I want to get all intermediate results of all the operations in the graph during forward propagation. Because this is quite simple to do in tensorflow, you can

  1. Load the graph;
  2. Get all tensors from the graph;
    e.g.
    all_ops = graph.get_operations()
    all_op_names = [op.name for op in all_ops]
    all_tensors = [graph.get_tensor_by_name(’{}:0’.format(op_name)) for op_name in all_op_names]
  3. For tf 1.15 which I currently use a lot, use output = session.run(all_tensors, feed_dict={d_input:input_images}) to get all output. Can we do the same thing here in PyTorch. The reason why I asked this, is I want to check all operations’ results running on two different platform, and check that the results are all the same. If some intermmediate result differs, I can spot that.
    Another question is can we get all operations in a training graph including all the backprop related operations?
    Much appreciated.

Yes, you could iterate the .named_modules() (or a combination of .named_children() etc.) and register a forward hook on each module, which would then store the outputs in the dict, if you are using the previously posted approach.

1 Like

Thanks. I’m wondering if we could dump out model structures into a file. Because I want to write a separate script to directly read any model structure files together with the weights in pth, load data, register forward hooks to get output of all layers. So I can read e.g. YOLOV3, Mask-RCNN, or maybe resnet50 model files without having to modify my YOLOV3 / Mask-RCNN/ resnet50 project source files to register a forward hook every time. If you know what I mean, is this possible?