Strange feature map; 83 % zeros

Does somebody know why over 87 % of the output tensor of my feature map is 0?

It happens with all image sizes (224x224, 512x512, 1242x375).
The output tensor of my feature extractor is: [1, 214016] (flatten) – at 1242x375 px images.
But only 26721 values are non-zeros.

I am using vgg16 and I have removed the classifier- and the avgpool-layers.
Any ideas?

import torchvision.models as models
import torch.nn as nn
device = "cuda"

#Model
model = models.vgg16(weights='IMAGENET1K_V1')
model.classifier = nn.Sequential(*list(model.classifier.children())[:-7])
model.avgpool = nn.Sequential(*list(model.avgpool.children())[:-1])

img_model = img_tensor.unsqueeze(0)
image_device = img_model.to(device)
img_feature = model.to(device)(image_device)

This is how the model looks like:

VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace=True)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace=True)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace=True)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace=True)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace=True)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace=True)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace=True)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): Sequential()
(classifier): Sequential()
)

Check the outputs of the layer prior to the final ReLU layer. If most of the values are 0 in the final layers, it’s probably because the values in the previous layer are negative.

If you’re using torchvision models from torchvision.models without pre-trained weights and getting lots of small negative values, I would recommend re-initialising the weights in the network before training.

That might not be a bad thing. How many non-zero values do you need in order to classify something? 17% non-zeros of 214,016 possible values is over 36k unique features to classify from.

If you had 1,000 classes, you really only need log_2(1,000) possible binary values, which is ≈10.

Granted, in classification problems where each feature set may not set all of your classes orthogonal to all other classes, more possible features are ideal.

You can think of each of those features as describing something contained in the picture. I.e. has feathers, has wheels, has white, has red, etc. Then the classifier matrix(your final layer) just multiplies by a trainable value and adds up each feature into each class probability. I.e. Has a beak, is green, has feathers, shaped like a bird, might classify strongly as a parrot.