Hi Absurd!

There are two issues here:

First, using for-loops generally slows pytorch down because they

prevent pytorch from performing operations on entire tensors using

its built in (and hopefully optimized) tensor operations.

Second, in your method you sum over each rectangular ROI separately,

potentially performing some of the same sums repeatedly. There is

a standard approach for avoiding this, where you first perform (in your

case) a two-dimensional cumulative sum over your `mask`

tensor, and

then reconstruct the ROI sums from the cumulative sum using the

so-called inclusion-exclusion technique.

(If you only have a few, or mostly non-overlapping ROIs, performing the

ROI sums directly will be cheaper. But if you have a lot of overlapping

ROIs, inclusion-exclusion can be dramatically more efficient. Also, gpu

vectorization will affect the trade-off above and beyond a simple count

of floating-point operations, so if this piece is performance-critical, run

some timing tests on realistic data.)

Here is a pytorch (version 0.3.0) script that reimplements your loop

solution with using inclusion-exclusion with pure pytorch tensor

operations:

```
import torch
torch.__version__
torch.manual_seed (2020)
C = 2
H = 3
W = 5
# N = 5
proposals = torch.LongTensor ([[0, 0, 0, 0], [2, 3, 2, 5], [0, 0, 1, 1], [1, 1, 3, 4], [2, 4, 3, 5]])
mask = (10.0 * torch.rand (C, H, W)).long() # some random integers
proposals
mask
# loop version
total_numbers = []
for i in range(proposals.shape[0]): # .shape[1] was incorrect
proposal = proposals[i]
total_number = torch.zeros (mask.shape[0]).long() # empty ROIs have sum of 0
if proposal[2] > proposal[0] and proposal[3] > proposal[1]:
mask_region = mask[:, proposal[0]: proposal[2], proposal[1]: proposal[3]]
total_number = mask_region.sum (2).sum (1)
total_numbers.append(total_number)
total_numbers = torch.stack (total_numbers, dim = 1) # make a 2d tensor instead of a list of tensors
total_numbers
# cumulative sums, inclusion-exclusion version
cmsk = mask.cumsum (1).cumsum (2) # pre-compute cumulative sums
pmsk = torch.nn.ZeroPad2d ((1, 0, 1, 0))(cmsk) # pad to support inclusion-exclusion indexing
# inclusion-exclusion ROI sum
msum = pmsk[:, proposals[:, 2], proposals[:, 3]] - pmsk[:, proposals[:, 0], proposals[:, 3]] - pmsk[:, proposals[:, 2] , proposals[:, 1]] + pmsk[:, proposals[:, 0], proposals[:, 1]]
msum
```

And here is the output:

```
>>> import torch
>>> torch.__version__
'0.3.0b0+591e73e'
>>>
>>> torch.manual_seed (2020)
<torch._C.Generator object at 0x0000019B5E066630>
>>>
>>> C = 2
>>> H = 3
>>> W = 5
>>> # N = 5
...
>>> proposals = torch.LongTensor ([[0, 0, 0, 0], [2, 3, 2, 5], [0, 0, 1, 1], [1, 1, 3, 4], [2, 4, 3, 5]])
>>> mask = (10.0 * torch.rand (C, H, W)).long() # some random integers
>>>
>>> proposals
0 0 0 0
2 3 2 5
0 0 1 1
1 1 3 4
2 4 3 5
[torch.LongTensor of size 5x4]
>>> mask
(0 ,.,.) =
4 1 5 1 4
2 5 8 2 6
5 8 7 4 8
(1 ,.,.) =
5 7 0 1 8
0 5 8 3 4
5 0 7 3 4
[torch.LongTensor of size 2x3x5]
>>>
>>> # loop version
... total_numbers = []
>>> for i in range(proposals.shape[0]): # .shape[1] was incorrect
... proposal = proposals[i]
... total_number = torch.zeros (mask.shape[0]).long() # empty ROIs have sum of 0
... if proposal[2] > proposal[0] and proposal[3] > proposal[1]:
... mask_region = mask[:, proposal[0]: proposal[2], proposal[1]: proposal[3]]
... total_number = mask_region.sum (2).sum (1)
... total_numbers.append(total_number)
...
>>> total_numbers = torch.stack (total_numbers, dim = 1) # make a 2d tensor instead of a list of tensors
>>>
>>> total_numbers
0 0 4 34 8
0 0 5 26 4
[torch.LongTensor of size 2x5]
>>>
>>> # cumulative sums, inclusion-exclusion version
... cmsk = mask.cumsum (1).cumsum (2) # pre-compute cumulative sums
>>> pmsk = torch.nn.ZeroPad2d ((1, 0, 1, 0))(cmsk) # pad to support inclusion-exclusion indexing
>>> # inclusion-exclusion ROI sum
... msum = pmsk[:, proposals[:, 2], proposals[:, 3]] - pmsk[:, proposals[:, 0], proposals[:, 3]] - pmsk[:, proposals[:, 2] , proposals[:, 1]] + pmsk[:, proposals[:, 0], proposals[:, 1]]
>>>
>>> msum
Variable containing:
0 0 4 34 8
0 0 5 26 4
[torch.LongTensor of size 2x5]
```

Good luck.

K. Frank