Visualizing response map using saved best model

I am training model on CIFAR-10 dataset and after every epoch I evaluate its testing accuracy with current best testing accuracy. Once the testing accuracy is higher than the best one I save the both the model and optimizer. After that I resume training of the model using the saved one.
Now whenever I save the best model I wanted to visualize the response map of a layer using the same(like using ‘./model_checkpoint.pth’). What is the approach of doing this task.

I am not sure to understand the question.
Does loading the model, one sample and getting the response map during the forward pass not what you want?

So I am actually using dynamic filter concept to generate the response map using dynamic convolution layer. I have 10 classes and at the DC layer it will generate response map for pertaining to each class.
Now every time after training the model I evaluate its accuracy and save the best model. After this I want to to again load this model visualize the response maps and finally resume training from the best one saved.
Below is the code which tries to achieve dynamic filter generation concept but I am not getting how to use it for visualization.

import torch
import torch.nn as nn
import torchvision.models as models
import random

class BaseModel(nn.Module):
    def __init__(self, args, classes):
        super(BaseModel, self).__init__()
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.args = args
        self.class_num = 10
        self.classes = classes

        ### initialize the BaseModel ---------------------

        ## YOUR CODE HERE
        self.embeddings = nn.Embedding(self.class_num, 128)
        self.wt = nn.Sequential(
                nn.Linear(128, 256),
                nn.Linear(256, 64),
        resnet18 = models.resnet18(pretrained=True)
        self.backbone = nn.Sequential(*list(resnet18.children())[0:5])
        self.dc = nn.Conv2d(64, 1 , kernel_size = 1, stride=1, padding=1)
        self.mlp = nn.Sequential(
                nn.Linear(64, 64),
                nn.Linear(64, 1))

        ### ----------------------------------------------

    def forward(self, imgs):

        ### complete the forward path --------------------

        cls_scores = torch.tensor((self.args.batch_size, self.class_num))
        ## YOUR CODE HERE
        for i in range(len(self.classes)):
            v = self.backbone(imgs)
            class_out = torch.tensor(i)
            class_out = self.embeddings(class_out) 
            w = self.wt(class_out)
            w = torch.reshape(w, (1,w.shape[0],1,1))
            self.dc.weight = nn.Parameter(w)
            v_out = self.dc(v)
            v_out = v_out/8
            out = nn.Upsample(size=(8, 8), mode='bilinear')(v_out)

            out = out.view(-1,1*8*8)
            score = self.mlp(out)
            if (i==0):
                cls_scores = score
            if (i>=1):
                cls_scores =,score), 1)
        ### ----------------------------------------------

        return cls_scores # Dim: [batch_size, 10]

import os
import time
import torch
import numpy as np
import matplotlib.pyplot as plt

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

criterion = torch.nn.CrossEntropyLoss()

def train(args, model, optimizer, dataloaders, classes):
    trainloader, testloader = dataloaders

    best_testing_accuracy = 0.0

    # training
    for epoch in range(args.epochs):

        batch_time = time.time(); iter_time = time.time()
        for i, data in enumerate(trainloader):

            imgs, labels = data
            imgs, labels =,
            cls_scores = model(imgs)
            loss = criterion(cls_scores, labels)

            if i % 100 == 0 and i != 0:
                print('epoch:{}, iter:{}, time:{:.2f}, loss:{:.5f}'.format(epoch, i,
                    time.time()-iter_time, loss.item()))
                iter_time = time.time()
        batch_time = time.time() - batch_time
        print('[epoch {} | time:{:.2f} | loss:{:.5f}]'.format(epoch, batch_time, loss.item()))

        if epoch % 1 == 0:
            testing_accuracy = evaluate(model, testloader)
            print('testing accuracy: {:.3f}'.format(testing_accuracy))

            if testing_accuracy > best_testing_accuracy:
                ### compare the previous best testing accuracy and the new testing accuracy
                ### save the model and the optimizer --------------------------------

                best_testing_accuracy = testing_accuracy

                ## YOUR CODE HERE
                        'epoch': epoch,
                        'model_state_dict': model.state_dict(),
                        'optimizer_state_dict': optimizer.state_dict(),
                        'loss': loss }, './model_checkpoint.pth')
                ### -----------------------------------------------------------------
                print('new best model saved at epoch: {}'.format(epoch))
                dataiter = iter(trainloader)
                images,labels =

def evaluate(model, testloader):
    total_count = torch.tensor([0.0]); correct_count = torch.tensor([0.0])
    for i, data in enumerate(testloader):
        imgs, labels = data
        imgs, labels =,
        total_count += labels.size(0)

        with torch.no_grad():
            cls_scores = model(imgs)

            predict = torch.argmax(cls_scores, dim=1)
            correct_count += (predict == labels).sum()
    testing_accuracy = correct_count / total_count
    return testing_accuracy.item()

def resume(model, optimizer, image, label):
    checkpoint_path = './model_checkpoint.pth'
    assert os.path.exists(checkpoint_path), ('checkpoint do not exits for %s' % checkpoint_path)

    ### load the model and the optimizer --------------------------------

    checkpoint = torch.load(checkpoint_path)
    ### -----------------------------------------------------------------

    print('Resume completed for the model\n')
    # visualize feature map
    activation = {}
    def get_activation(name):
        def hook(model, input, output):
            activation[name] = output.detach()
        return hook
    output = model(image)

    act = activation['dc'].squeeze()
    fig, ax = plt.subplots(act.size(0))
    for idx in range(act.size(0)):


    return model, optimizer

def imshow(image, label, ax=None, title=None):
    if ax is None:
        fig, ax = plt.subplots()
    if title:
    image = image.transpose((1, 2, 0))
    # Undo preprocessing
    mean = np.array([0.5, 0.5, 0.5])
    std = np.array([0.5, 0.5, 0.5])
    image = std * image + mean
    # Image needs to be clipped between 0 and 1
    image = np.clip(image, 0, 1)
    return ax

So you mean the resume() function is not doing what you want? It seems to be plotting the activations no?

Unrelated note, when you do self.dc.weight = nn.Parameter(w), you actually detach w implicitly. That mens that no gradient will ever flow black to your self.embeddings and self.wt.
I guess this is not what you want right?
To fix this, you should delete the wieghts of the convolution after initialization so that they are not registered as learnable parameters:

self.dc = nn.Conv2d(64, 1 , kernel_size = 1, stride=1, padding=1)
del self.dc.weight # The forward pass will set this appropriately
# Also do you want to change self.dc.bias? You seem to ignore it here.

Then during the forward, you can set the weights without using a Parameter:

self.dc.weight = w
v_out = self.dc(v)
# The next line is not strictly necessary but
# it is a good practice to avoid having side effects in forward method
del self.dc.weight

Second node:
You don’t have to run your backbone multiple times, you can compute v once and reuse it !
Similarly, you can compute all the new weights at once. and then just use the for loop from the moment you change the weights onward.

I wanted gradients to flow to wt (weight generator and embedding layer) and backbone layer but not to DC layer as it just gets the weights from the weight generator using class embeddings for each of the class.

And for the response map , I wanted it to display for all the classes.

Resume is doing that right?

no its not giving me response map for each class and moreover the visualisation is not good.

I cannot run the code, but this seems to loop over all the classes no?

    act = activation['dc'].squeeze()
    fig, ax = plt.subplots(act.size(0))
    for idx in range(act.size(0)):

Also what do you mean the visualisation is not good? are the results not good? Or the layout is not good?

This is how I am getting .
Original Image

Response map

I am hardly able to interpret anything from this.