How to sum up all values according to rectangular ROIs of a feature map quickly

Supporting that I have a N × 4 tensor which is N rectangular ROIs of a mask(C × H × W), and I want to sum up all value of each channel inside a ROI to get a tensor of size N × C, the only way I can figure out is using a for loop. But this method is very slow, so I want to ask if there are faster way to do this?
The for loop method is as below:

proposals # N * 4 tensor (ROIs)
mask # C * H * W tensor
total_numbers = []
for i in range(proposals.shape[1]):
    proposal = proposals[i]
    mask_region = mask[:, proposal[0]: proposal[2], proposal[1]: proposal[3]]
    total_number = mask_region.sum(dim=(1, 2))
    total_numbers.append(total_number)

Hi Absurd!

There are two issues here:

First, using for-loops generally slows pytorch down because they
prevent pytorch from performing operations on entire tensors using
its built in (and hopefully optimized) tensor operations.

Second, in your method you sum over each rectangular ROI separately,
potentially performing some of the same sums repeatedly. There is
a standard approach for avoiding this, where you first perform (in your
case) a two-dimensional cumulative sum over your mask tensor, and
then reconstruct the ROI sums from the cumulative sum using the
so-called inclusion-exclusion technique.

(If you only have a few, or mostly non-overlapping ROIs, performing the
ROI sums directly will be cheaper. But if you have a lot of overlapping
ROIs, inclusion-exclusion can be dramatically more efficient. Also, gpu
vectorization will affect the trade-off above and beyond a simple count
of floating-point operations, so if this piece is performance-critical, run
some timing tests on realistic data.)

Here is a pytorch (version 0.3.0) script that reimplements your loop
solution with using inclusion-exclusion with pure pytorch tensor
operations:

import torch
torch.__version__

torch.manual_seed (2020)

C = 2
H = 3
W = 5
# N = 5

proposals = torch.LongTensor ([[0, 0, 0, 0], [2, 3, 2, 5], [0, 0, 1, 1], [1, 1, 3, 4], [2, 4, 3, 5]])
mask = (10.0 * torch.rand (C, H, W)).long()   # some random integers

proposals
mask

# loop version
total_numbers = []
for i in range(proposals.shape[0]):  # .shape[1] was incorrect
    proposal = proposals[i]
    total_number = torch.zeros (mask.shape[0]).long()  # empty ROIs have sum of 0
    if  proposal[2] > proposal[0]  and  proposal[3] > proposal[1]:
        mask_region = mask[:, proposal[0]: proposal[2], proposal[1]: proposal[3]]
        total_number = mask_region.sum (2).sum (1)
    total_numbers.append(total_number)

total_numbers = torch.stack (total_numbers, dim = 1)   # make a 2d tensor instead of a list of tensors

total_numbers

# cumulative sums, inclusion-exclusion version
cmsk = mask.cumsum (1).cumsum (2)   # pre-compute cumulative sums
pmsk = torch.nn.ZeroPad2d ((1, 0, 1, 0))(cmsk)   # pad to support inclusion-exclusion indexing
# inclusion-exclusion ROI sum
msum = pmsk[:, proposals[:, 2], proposals[:, 3]] - pmsk[:, proposals[:, 0], proposals[:, 3]] - pmsk[:, proposals[:, 2] , proposals[:, 1]] + pmsk[:, proposals[:, 0], proposals[:, 1]]

msum

And here is the output:

>>> import torch
>>> torch.__version__
'0.3.0b0+591e73e'
>>>
>>> torch.manual_seed (2020)
<torch._C.Generator object at 0x0000019B5E066630>
>>>
>>> C = 2
>>> H = 3
>>> W = 5
>>> # N = 5
...
>>> proposals = torch.LongTensor ([[0, 0, 0, 0], [2, 3, 2, 5], [0, 0, 1, 1], [1, 1, 3, 4], [2, 4, 3, 5]])
>>> mask = (10.0 * torch.rand (C, H, W)).long()   # some random integers
>>>
>>> proposals

 0  0  0  0
 2  3  2  5
 0  0  1  1
 1  1  3  4
 2  4  3  5
[torch.LongTensor of size 5x4]

>>> mask

(0 ,.,.) =
  4  1  5  1  4
  2  5  8  2  6
  5  8  7  4  8

(1 ,.,.) =
  5  7  0  1  8
  0  5  8  3  4
  5  0  7  3  4
[torch.LongTensor of size 2x3x5]

>>>
>>> # loop version
... total_numbers = []
>>> for i in range(proposals.shape[0]):  # .shape[1] was incorrect
...     proposal = proposals[i]
...     total_number = torch.zeros (mask.shape[0]).long()  # empty ROIs have sum of 0
...     if  proposal[2] > proposal[0]  and  proposal[3] > proposal[1]:
...         mask_region = mask[:, proposal[0]: proposal[2], proposal[1]: proposal[3]]
...         total_number = mask_region.sum (2).sum (1)
...     total_numbers.append(total_number)
...
>>> total_numbers = torch.stack (total_numbers, dim = 1)   # make a 2d tensor instead of a list of tensors
>>>
>>> total_numbers

  0   0   4  34   8
  0   0   5  26   4
[torch.LongTensor of size 2x5]

>>>
>>> # cumulative sums, inclusion-exclusion version
... cmsk = mask.cumsum (1).cumsum (2)   # pre-compute cumulative sums
>>> pmsk = torch.nn.ZeroPad2d ((1, 0, 1, 0))(cmsk)   # pad to support inclusion-exclusion indexing
>>> # inclusion-exclusion ROI sum
... msum = pmsk[:, proposals[:, 2], proposals[:, 3]] - pmsk[:, proposals[:, 0], proposals[:, 3]] - pmsk[:, proposals[:, 2] , proposals[:, 1]] + pmsk[:, proposals[:, 0], proposals[:, 1]]
>>>
>>> msum
Variable containing:
  0   0   4  34   8
  0   0   5  26   4
[torch.LongTensor of size 2x5]

Good luck.

K. Frank

2 Likes

Thanks for your reply. It helps me a lot! In my case N is large, so the inclusion-exclusion version is much faster than the loop version.