{max, min}-pooling winner counter layer

Hello everyone,

I am implementing a max_pooling-like layer whose function is to count how many times the position i is the winner of the pooling convolution operation.

My initial implementation looks like this:

import torch
import torch.nn as nn

class MaxAccPool2d(nn.Module):
    """
    Accumulator maximum pooling module.
    Example:
        >>> import torch
        >>> from src import MaxAccPool2d
        >>> from torch.autograd import Variable
        >>> input = Variable(torch.Tensor([[
              [1, 2, 3, 4, 5, 6, 7, 8, 9],
              [5, 6, 7, 8, 9, 1, 2, 3, 4],
              [9, 8, 7, 6, 5, 4, 3, 2, 1]]]))
        >>> c_maxaccpool = MaxAccPool2d(kernel_size=(2, 2), stride=(1, 1), padding=0)
        >>> c_maxaccpool(input)
        tensor([[[0, 0, 0, 0, 0, 0, 1, 1, 1],
                [0, 1, 1, 2, 4, 0, 0, 1, 1],
                [1, 1, 0, 0, 0, 1, 0, 0, 0]]])
    """
    def __init__(self, kernel_size, stride=None, padding=0, dilation=1):
        super(MaxAccPool2d, self).__init__()
        self.maccpool2d = nn.MaxPool2d(kernel_size, stride, padding, return_indices=True, ceil_mode=False)
    
    def forward(self, input):
        input_length, input_shape = input.nelement(), input.size()
        _, indices = self.maccpool2d(input)
        return torch.bincount(indices.view(-1), minlength=input_length).view(input_shape)

When I started training my network, I noticed two things:

  1. bincount appears to be slow on the GPU.
  2. More critically, the solution is wrong. The problem is that the counting gets carried out only in the first channel, because of the reference that the pooling layers use.

For the solution with bincount to work, I need to have the indices with respect to the outer list, as follows:

This is what return_indices sees, for maxpool2d:

–––––––––––––––––––––––––––
|| | 0  1| |   | | 0  1| ||
|| | 2  3| |   | | 2  3| ||
||–––––––––| , |–––––––––||
|| | 0  1| |   | | 0  1| ||
|| | 2  3| |   | | 2  3| ||
–––––––––––––––––––––––––––

This is what I need:

–––––––––––––––––––––––––––
|| | 0  1| |   | | 8  9| ||
|| | 2  3| |   | |10 11| ||
||–––––––––| , |–––––––––||
|| | 4  5| |   | |12 13| ||
|| | 6  7| |   | |14 15| ||
–––––––––––––––––––––––––––

The following is an example for which the implementation doesn’t work:

import torch
from src import MaxAccPool2d
from torch.autograd import Variable
input = Variable(torch.Tensor([[
  [[1, 2, 3, 4, 5, 6, 7, 8, 9],
  [5, 6, 7, 8, 9, 1, 2, 3, 4],
  [9, 8, 7, 6, 5, 4, 3, 2, 1]],
  [[1, 2, 3, 4, 5, 6, 7, 8, 9],
  [5, 6, 7, 8, 9, 1, 2, 3, 4],
  [9, 8, 7, 6, 5, 4, 3, 2, 1]]
]]))
energy_pooling = MaxAccPool2d(kernel_size=(2, 2), stride=(1, 1), padding=0)
energy_pooling(input)

It should give:

tensor([[[[0, 0, 0, 0, 0, 0, 1, 1, 1],
          [0, 1, 1, 2, 4, 0, 0, 1, 1],
          [1, 1, 0, 0, 0, 1, 0, 0, 0]],

         [[0, 0, 0, 0, 0, 0, 1, 1, 1],
          [0, 1, 1, 2, 4, 0, 0, 1, 1],
          [1, 1, 0, 0, 0, 1, 0, 0, 0]]]])

But I am getting:

tensor([[[[0, 0, 0, 0, 0, 0, 2, 2, 2],
          [0, 2, 2, 4, 8, 0, 0, 2, 2],
          [2, 2, 0, 0, 0, 2, 0, 0, 0]],

         [[0, 0, 0, 0, 0, 0, 0, 0, 0],
          [0, 0, 0, 0, 0, 0, 0, 0, 0],
          [0, 0, 0, 0, 0, 0, 0, 0, 0]]]])

Note that the two channels contain the same data, so this is not a general use case.

Any idea of how to obtain the indices that I need? Or alternatives to bincount?

Thanks in advance,
José

Looking at the source code here, and to this issue, I realize that what I need is to get the flattened indices – the indices ONNX returns.

With this in hand I will be able to use bincount as I intended.