Confusion about impact on backward function

kai-tub · January 13, 2020, 9:58pm

Hi,
first of all thanks to everyone working on this awesome library!
I’ve started looking into pytorch for a couple of months now, but I’ve reached a point where I cannot explain a specific behavior. I am sorry for not being able to get a smaller minimal example up but in the smaller versions the problems do not appear:

"""
@author: Utku Ozbulak - github.com/utkuozbulak
Available at: https://github.com/utkuozbulak/pytorch-cnn-visualizations/blob/master/src/cnn_layer_visualization.py
Small modifications:
"""

import numpy as np
import torch
from torch.optim import Adam
from torchvision import models


class CNNLayerVisualization:
    """
        Produces an image that minimizes the loss of a convolution
        operation for a specific layer and filter
    """

    def __init__(self, model, selected_layer, selected_filter):
        self.model = model
        self.model.eval()
        self.selected_layer = selected_layer
        self.selected_filter = selected_filter
        self._conv_output = None

    def visualise_layer(self):
        # Generate a pseudo-random image
        np.random.seed(123)
        random_image = np.random.uniform(0, 1, (3, 224, 224))
        # Process image => For simplicitely to not transform here:
        # processed_image = preprocess_image(random_image, False)
        processed_image = torch.from_numpy(random_image).float().unsqueeze_(0)
        processed_image.requires_grad = True
        # Add one more channel to the beginning. Tensor shape = 1,3,224,224

        # Define optimizer for the image
        optimizer = Adam([processed_image], lr=0.1)
        for i in range(1, 5):
            optimizer.zero_grad()
            # Assign create image to a variable to move forward in the model
            x = processed_image
            for index, layer in enumerate(self.model):
                # Forward pass layer by layer
                # x is not used after this point because it is only needed to trigger
                # the forward hook function
                x = layer(x)
                # Only need to forward until the selected layer is reached
                if index == self.selected_layer:
                    self._conv_output = x[0, self.selected_filter]
                    # TODO: Explain why it makes a difference when the following
                    # line is removed!
                    break
            loss = -torch.mean(self._conv_output)
            loss.backward()
            # Loss function is the mean of the output of the selected layer/filter
            # We try to minimize the mean of the output of that specific filter
            print("Loss:", "{0:.2f}".format(loss.data.numpy()))
            optimizer.step()


if __name__ == "__main__":
    cnn_layer = 0
    filter_pos = 2
    pretrained_model = models.vgg16(pretrained=True).features
    layer_vis = CNNLayerVisualization(pretrained_model, cnn_layer, filter_pos)
    layer_vis.visualise_layer()

The problem lies in the TODO line. I wanted to visualize different filter by utilizing gradient ascend and I’ve used this awesome github project as a reference.
Now is our model not iterable like the vgg16 model used in the repository. In my opinion, it shouldn’t make a difference if the input image, which is being maximized, goes through the full model or stops at the desired layer, as the backward function is only applied to operations leading up to the desired layer/filter. Smaller tests confirm this. But if you run the given script once with the break line and then without the break line, the loss differs significantly. I cannot explain why and hope that somebody could demystify this for me.

Thanks!

albanD · January 13, 2020, 10:09pm

Hi,

Have you tried running the loop for more than 5 iteration? Do they converge to the same result?
Is it simply because you get different random sample?

kai-tub · January 13, 2020, 10:40pm

Hi,

Have you tried running the loop for more than 5 iteration?

It should always produce the same result at each step.
It can be easily verified by running the script multiple times, the loss remains the same between runs, but differs if the break line is removed.

Do they converge to the same result?

The convergence is not the point, but the reason why the results differ at each step.

Is it simply because you get different random sample?

Setting np.random.seed(123) should take care of always producing the same sample

albanD · January 13, 2020, 11:12pm

Setting np.random.seed(123) should take care of always producing the same sample

Unless later in the network, you generate more random numbers. Then depending if you run these layers, you will get different random inputs.

TODO: Explain why it makes a difference when the following

Maybe the layer just after the conv is an activation function that is applied inplace. And so it changes the output of the conv. You can clone the result to avoid this with: self._conv_output = x[0, self.selected_filter].clone().

kai-tub · January 13, 2020, 11:19pm

You are correct!
The VGG16 model from pytorch uses nn.ReLU(inplace=True), this seemed to cause the problem.

Thanks!