How to concatenate feature maps of different batch sizes?

messi · April 29, 2020, 1:48am

Suppose I have two feature maps F1 of shape [K, C, H, W] and F2 of shape [M, C, H, W].
Furthermore, let’s suppose F1 is a collection of K feature maps, each of dimension [C, H, W] and F2 is a collection of M feature maps, each of dimension [C, H, W].

Here is the task: For each feature map 1 <= i <= K of F1, I want to concatenate i with every feature map 1 <= j <= M of F2 to obtain MK feature maps of dimensions [2C, H, W].
In other words, I need to obtain [M, K, 2C, H, W].

What are some efficient ways to do this?

One way I can think of is the following:

F1 = F1.unsquueze(0).repeat(M, 1, 1, 1, 1) ([M, K, C, H, W])
F2 = F2.unsqueeze(1).repeat(1, K, 1, 1, 1) (M, K ,C, H, W])
Fcat = torch.cat([F1, F2], dim=2) ([M, K, 2C, H, W])

Is this a correct and an efficient way to do this?

vainaijr · April 29, 2020, 2:17am

maybe,

k, m, c, h, w = 2, 3, 4, 5, 6 
x = torch.randn(k, c, h, w)
y = torch.randn(m, c, h, w)
a = torch.repeat_interleave(x, m).view(k, m, c, h, w)
b = torch.repeat_interleave(y, k).view(k, m, c, h, w)
ab = torch.cat((a, b), dim=2)

messi · April 29, 2020, 2:21am

Is this similar to the following?

F1 = F1.unsquueze(0).repeat(M, 1, 1, 1, 1) ([M, K, C, H, W])
F2 = F2.unsqueeze(1).repeat(1, K, 1, 1, 1) (M, K ,C, H, W])
Fcat = torch.cat([F1, F2], dim=2) ([M, K, 2C, H, W])

vainaijr · April 29, 2020, 2:35am

I think the difference is between repeat_interleave and repeat, when we use,

a = torch.randn(3, 3)
torch.repeat_interleave(a, 2)

then it will flatten our ‘a’ matrix first, and then repeat each element two times.
so if ‘a’ is

tensor([[ 1.0735,  0.9241,  0.5039],
        [ 0.3616, -0.4308,  0.5240],
        [ 1.3007,  0.7306,  0.4381]])

then

torch.repeat_interleave(a, 2)

would give

tensor([ 0.7962,  0.7962,  0.6094,  0.6094, -0.5283, -0.5283, -0.8183, -0.8183,
        -1.5764, -1.5764, -0.5292, -0.5292,  1.1216,  1.1216, -3.6188, -3.6188,
        -1.1325, -1.1325])

after reshaping this, we would see matrix like,

torch.repeat_interleave(a, 2).reshape(2, 3, 3)

tensor([[[ 1.0735,  1.0735,  0.9241],
         [ 0.9241,  0.5039,  0.5039],
         [ 0.3616,  0.3616, -0.4308]],

        [[-0.4308,  0.5240,  0.5240],
         [ 1.3007,  1.3007,  0.7306],
         [ 0.7306,  0.4381,  0.4381]]])

while if we use

a.repeat(2, 2)

then, we get,

tensor([[ 1.0735,  0.9241,  0.5039,  1.0735,  0.9241,  0.5039],
        [ 0.3616, -0.4308,  0.5240,  0.3616, -0.4308,  0.5240],
        [ 1.3007,  0.7306,  0.4381,  1.3007,  0.7306,  0.4381],
        [ 1.0735,  0.9241,  0.5039,  1.0735,  0.9241,  0.5039],
        [ 0.3616, -0.4308,  0.5240,  0.3616, -0.4308,  0.5240],
        [ 1.3007,  0.7306,  0.4381,  1.3007,  0.7306,  0.4381]])

if we want same element to appear in nearby index, then we use repeat_interleave, otherwise we use repeat.

messi · April 29, 2020, 2:42am

So among the above two methods, which one do you think is a correct way to approach the problem?

vainaijr · April 29, 2020, 3:34am

maybe repeat appears better, I cannot give a strong argument to support one against another, we can take an example to see which one might be better, suppose we have,

a = torch.randn(3, 3); a

tensor([[-1.6339,  0.5842,  0.3604],
        [ 0.2360,  0.5282,  1.8202],
        [-0.2033, -0.4426, -0.0329]])

suppose we wanted a repeated tensor of shape (2, 3, 3)
so one way would be, we do,

torch.repeat_interleave(a, 2)

and get,

tensor([[[-1.6339, -1.6339,  0.5842],
         [ 0.5842,  0.3604,  0.3604],
         [ 0.2360,  0.2360,  0.5282]],

        [[ 0.5282,  1.8202,  1.8202],
         [-0.2033, -0.2033, -0.4426],
         [-0.4426, -0.0329, -0.0329]]])

another way would be,

torch.repeat_interleave(a, torch.tensor([2]), dim=0).reshape(2, 3, 3)

we get,

tensor([[[-1.6339,  0.5842,  0.3604],
         [-1.6339,  0.5842,  0.3604],
         [ 0.2360,  0.5282,  1.8202]],

        [[ 0.2360,  0.5282,  1.8202],
         [-0.2033, -0.4426, -0.0329],
         [-0.2033, -0.4426, -0.0329]]])

or we use repeat,

a.repeat(2, 1).reshape(2, 3, 3)

tensor([[[-1.6339,  0.5842,  0.3604],
         [ 0.2360,  0.5282,  1.8202],
         [-0.2033, -0.4426, -0.0329]],

        [[-1.6339,  0.5842,  0.3604],
         [ 0.2360,  0.5282,  1.8202],
         [-0.2033, -0.4426, -0.0329]]])

if we were applying something like, MaxPool after this, then in the first two cases, it would consider something like this, (if we had 2x2 window)

[-1.6339,  0.5842]
[-1.6339,  0.5842]

which appears a bit weird comparing same terms.

or if we were applying Dropout, maybe dropping 2x2 blocks at random, then,
entire block of,

[-1.6339,  0.5842]
[-1.6339,  0.5842]

might become zero, and we would not have those values anywhere else, this again appears a bit weird

or if we were applying something like nn.LayerNorm after this,
so, if we have a block which has a lot of same values, like,

a = torch.tensor([[1., 1., 1.], [2., 2., 2.]])
b = nn.LayerNorm(3)

then, b(a) would give

tensor([[0., 0., 0.],
        [0., 0., 0.]], grad_fn=<NativeLayerNormBackward>)

maybe we do not want to zero out most of the blocks.

it depends on what we apply after this repeat.