Feature Visualization through Random Image Optimization

I’m trying to visualize the output of a particular activation layer (LeakyReLU) through random image optimization, but for some reason all I got is noise. I will post my code, maybe there is something I’m overlooking here. This is a YOLO (Darknet53) network by the way. I’m attaching the hook before the first YOLO layer in the network.

# Tensor wrapper.
Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor

# Apply Feature Visualization to each feature map. The random image should be of size CxHxW.
img = (np.uint8(
    np.random.uniform(
        150,
        180,
        (3,
         int(self.model.hyperparams['height']),
         int(self.model.hyperparams['height']))
    )) / 255)

# Convert numpy image to tensor and add 1 dimension, final outcome is 1xCxHxW.
img_tensor = torch.tensor(img).unsqueeze(0).type(Tensor).detach().requires_grad_(True)
optimizer = torch.optim.Adam([img_tensor], lr=0.1, weight_decay=1e-6)

for k in range(0, 30):
    optimizer.zero_grad()

    _ = self.model(img_tensor)

    loss = self.activation_output.mean()
    loss *= -1
    print('> Iteration: {} Loss: {:.2f}'.format(k + 1, loss.data.cpu().numpy()))
    loss.backward()

    optimizer.step()

    if k == 30 - 1:
        img = img_tensor[0].clone().data.cpu().numpy().transpose(1, 2, 0)  # Transform to HxWxC.

image = ToPILImage(mode='RGB')(img)

The code might be alright, but I’m unfamiliar with the use case.
It seems you are trying to maximize the intermediate activation by optimizing the input tensor?
Do you have any references, why a random input tensor would not be the result, i.e. should the input tensor learn some structure for this task?

It is a method called “Activation Maximization” introduced in a paper by Bengio et al. The goal is to pass a random image through the network until a certain activation layer (previously detected as activating higher than others for a particular image). By differentiating the pixels of the random image in the direction of the highest ascent (maximum mean activation), the results should be an image that activates the most for that particular feature map.

I can attest that the image is changing properly and that the mean activation is being increased from .98 to 3.75 where it plateaus. Maybe the problem is with my post-processing of the resulting image, since I’m not an expert in image manipulation.

This workflow sounds similar to the DeepDream approach.
If I remember it correctly, one shortcoming from optimizing all input pixels was that high frequency pattern were created (as might be seen for your use case).
From the Inceptionism blog post:

Start with an image full of random noise, then gradually tweak the image towards what the neural net considers a banana (see related work in [1], [2], [3], [4]). By itself, that doesn’t work very well, but it does if we impose a prior constraint that the image should have similar statistics to natural images, such as neighboring pixels needing to be correlated.