Feature selection to feature maps using mutual information


I have a MDNet with 3 conv layers and a few fully-connected layers afterwards, and I want to do feature selection after the third conv layer, because there’s some redundancy (the net is doing tracking for a single object, thus it’s like a two class classification - background and object).
I have the feature vector (of length 4608. actually it’s [batch_size, 4608]) from 512 feature maps and I want to do a feature selection, and choose only 256 feature maps with low MI between themselves.
From the mutual information I should get a 512X512 matrix like described in the bottom of page 5 and do the feature selection.

In short:
The mutual information between
two feature maps is estimated using histograms of them. The
activation values are distributed in 20 bins and the we can
get probabilities for each bin. The mutual information of any
two feature maps are calculated and these values are stored
in a 512x512 matrix.

I have a few questions:

  1. First, how do I distribute the feature vector into 20 bins and calculate the mutual info matrix?
    Not sure if there’s a way to do it with pytorch, maybe I should convert to np, calculate bins using np.histogram, find somehow the distribution and then assist mutual_info_score from sk.learn(?) and then convert back to tensor.
    Maybe stay with the tensors, calculate the bins and assign the data, and calculate the MI myself?
  2. Assuming I get that matrix and find the 256 features with the lowest values between themselves (excluding 0), how do I that selection of the specific feature map and pass to the next fc layer?
  3. I pass batches to the network and get from the feature extractor features from different batches, not just sample, thus the feature vector’s shape is [batch_size, 4096] and not only 4096. Should it change anything?

In short - I want to do feature selection for the feature maps using mutual information between the feature maps and don’t know exactly how to do it and probably making it more complicated than it should be.


It’s not necessary for answering, but that’s the link to MDNet files on github (the net is in modules/model, and the feature extractor - forward _samples is in tracking/run_tracker.py) :

  1. As far as I understand you don’t need to backpropagte through the MI calculation. If that’s correct, I would recommend to just use whatever seems simpler. E.g. if sklearn already has some utility functions which make your life easier, just use it and calculate the necessary indices for your feature selection. Note that I also assume this computation will only be applied once in a while, so that the performance doesn’t really matter and thus the missing GPU support in numpy is not a huge bottleneck.

  2. If you would like to select some feature maps from the conv activation, you could simply index them in the forward method of your model.

  3. As long as you calculate the feature indices for each sample in the batch, step 2 should work just fine.

1 Like
  1. Yeah, I don’t need to backprop the MI, just for feature selection.
    I’ll probably do the feature selection with sklearn or something like that, I already have most of it.

  2. Sounds like it should be quite easy.
    Can you give a short code example of the changes in the forward part? let’s say a conv layer with 5 feature maps, a FC layer that follows, and selection of features according to [0,1,0,1,1].

  3. Great.

This code snippet would work for a batch size of 2.
Note that I assume you are using a constant number of selected features in each batch (in this case 3 features).

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv1 = nn.Conv2d(1, 5, 3, 1, 1)
        self.fc = nn.Linear(3*10*10, 10)
    def forward(self, x, index):
        x = self.conv1(x)
        x = x[index].view(x.size(0), -1)
        x = self.fc(x)
        return x

batch_size, c, h, w = 2, 1, 10, 10
x = torch.randn(batch_size, c, h, w)
index = torch.tensor([[0, 1, 0, 1, 1],
                      [1, 0, 1, 1, 0]], dtype=torch.bool)
model = MyModel()
out = model(x, index)

Yes, the number of selected features is constant.

Thank you! :slight_smile:

1 Like

Hi @ptrblck! How would you suggest to proceed if we want to backprop through the MI calculation? Say, we want to learn a mapping from X (e.g. a given hidden layer) to Z (e.g. a compressed version of X) so that the mutual info between X and Z is maximal. How would you suggest to proceed?
I have come across this paper (https://arxiv.org/pdf/1801.04062.pdf) which builds a neural estimator of the MI which can be backpropagated through, but I was wondering if there was a simpler way. Thanks :slight_smile: