Differentiable indexing for the channels

tremblerz · September 14, 2020, 1:22am

I am trying to write a neural network which can learn to assign scores to each channel. The following code is not able to run backprop. Here is a toy example to illustrate the problem

class ChannelScorer(nn.Module):
    def __init__(self):
        super(ChannelScorer, self).__init__()
        self.layer1 = nn.Conv2d(in_channels=100, out_channels=200, kernel_size=3, stride=2)
        self.layer2 = nn.Linear(1800, 100)
    def forward(self, x):
        z1 = self.layer1(x)
        z1 = torch.flatten(z1, start_dim=1)
        z2 = self.layer2(z1)
        return z2

channel_scorer = ChannelScorer()

x = torch.rand(32, 100, 8, 8)
y = torch.rand(32, 100, 8, 8)
indices_score = channel_scorer(x)
indices = torch.topk(indices_score, 50)[1]
x[:, indices, :, :] = 0

loss = (x-y)**2
loss = torch.mean(loss)

loss.backward()

I am getting error as

97 Variable._execution_engine.run_backward(
98 tensors, grad_tensors, retain_graph, create_graph,
—> 99 allow_unreachable=True) # allow_unreachable flag
100
101
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

My guess is that the indexing using the topk indices is making this whole thing non-differentiable. Is there a way to circumvent this situation?

ptrblck · September 16, 2020, 8:53am

The returned indices by topk do not have a grad_fn, since this operation is not differentiable, and will thus detach the computation graph.