How to use matrix manipulation to make this faster?

Hi, I hope to use some kind of matrix manipulation to make this code faster:

for i in range(dl_dw.size(2)):
            for j in range(dl_dw.size(3)):
                dl_dx[:, :, i//self.scale_factor, j//self.scale_factor] += dl_dw[:, :, i, j]

Is it possible? Thanks!

When sharing code can you share a minimal reproducible example, not just the function?

So share an example input and output for this function to show the expected behavior.

Hi, thanks for the reply! The idea is to reduce the size of matrix by summing the submatrix in the original matrix. In this example, the scale factor is 2, and each matrix is a 4D tensor, but we only need to consider the last two dimensions

a = torch.tensor([[1.,2,5,6], [3,4,7,8], [1,1,2,2], [1,1,3,4]], requires_grad=True).view(1,1,4,4)

You could do one of the following

  • Use a convolutional layer with custom kernel
conv = torch.nn.Conv2d(1, 1, 2, 2, bias=False)
kernel = torch.tensor([[1., 1.],
                       [1., 1.]]).unsqueeze(0).unsqueeze(0)
with torch.no_grad():
    conv.weight = nn.Parameter(kernel)

# Output:
#tensor([[[[10., 26.],
#          [ 4., 11.]]]], grad_fn=<ConvolutionBackward0>)
  • Use average pooling and multiply by the number of pixels inside the sliding window
avg = torch.nn.AvgPool2d(2,2)
# Output:
#tensor([[[[10., 26.],
#          [ 4., 11.]]]], grad_fn=<MulBackward0>)

Hope this helps :smile:

1 Like

Thanks for the solution! Yes, this would work, I also wonder if there’s a way to achieve this without using the nn modules but only the tensor operations (without loops) :blush:

You could do something like this

a = torch.tensor([[[[1.,2,5,6], [3,4,7,8], [1,1,2,2], [1,1,3,4]]]], requires_grad=True)
B, C, H, W = a.shape

kernel = 2
out_h = H // kernel
out_w = W // kernel

rows = torch.arange(0, H, kernel).repeat(out_h)
cols = torch.arange(0, W, kernel).repeat_interleave(out_w)

x0y0 = a[..., cols+0, rows+0]
x0y1 = a[..., cols+0, rows+1]
x1y0 = a[..., cols+1, rows+0]
x1y1 = a[..., cols+1, rows+1]

out = (x0y0 + x0y1 + x1y0 + x1y1).reshape(B, C, out_h, out_w)

If you want a different kernel size, then you need to add the necessary xMyN=a[..., cols+M, rows+N] and add them all up.

But this workaround is very prone to errors. I would rather use any of the other solutions given, as they are more robust and will not break that easily.

1 Like