How to use matrix manipulation to make this faster?

Charlie_Li · May 20, 2022, 10:59pm

Hi, I hope to use some kind of matrix manipulation to make this code faster:

for i in range(dl_dw.size(2)):
            for j in range(dl_dw.size(3)):
                dl_dx[:, :, i//self.scale_factor, j//self.scale_factor] += dl_dw[:, :, i, j]

Is it possible? Thanks!

AlphaBetaGamma96 · May 21, 2022, 12:04am

When sharing code can you share a minimal reproducible example, not just the function?

So share an example input and output for this function to show the expected behavior.

Charlie_Li · May 21, 2022, 3:54am

Hi, thanks for the reply! The idea is to reduce the size of matrix by summing the submatrix in the original matrix. In this example, the scale factor is 2, and each matrix is a 4D tensor, but we only need to consider the last two dimensions

Matias_Vasquez · May 21, 2022, 7:21am

a = torch.tensor([[1.,2,5,6], [3,4,7,8], [1,1,2,2], [1,1,3,4]], requires_grad=True).view(1,1,4,4)

You could do one of the following

Use a convolutional layer with custom kernel

conv = torch.nn.Conv2d(1, 1, 2, 2, bias=False)
kernel = torch.tensor([[1., 1.],
                       [1., 1.]]).unsqueeze(0).unsqueeze(0)
with torch.no_grad():
    conv.weight = nn.Parameter(kernel)

print(conv(a))
# Output:
#tensor([[[[10., 26.],
#          [ 4., 11.]]]], grad_fn=<ConvolutionBackward0>)

Use average pooling and multiply by the number of pixels inside the sliding window

avg = torch.nn.AvgPool2d(2,2)
print(avg(a)*4)
# Output:
#tensor([[[[10., 26.],
#          [ 4., 11.]]]], grad_fn=<MulBackward0>)

Hope this helps

Charlie_Li · May 21, 2022, 11:25am

Thanks for the solution! Yes, this would work, I also wonder if there’s a way to achieve this without using the nn modules but only the tensor operations (without loops)

Matias_Vasquez · May 21, 2022, 12:11pm

You could do something like this

a = torch.tensor([[[[1.,2,5,6], [3,4,7,8], [1,1,2,2], [1,1,3,4]]]], requires_grad=True)
B, C, H, W = a.shape

kernel = 2
out_h = H // kernel
out_w = W // kernel

rows = torch.arange(0, H, kernel).repeat(out_h)
cols = torch.arange(0, W, kernel).repeat_interleave(out_w)

x0y0 = a[..., cols+0, rows+0]
x0y1 = a[..., cols+0, rows+1]
x1y0 = a[..., cols+1, rows+0]
x1y1 = a[..., cols+1, rows+1]

out = (x0y0 + x0y1 + x1y0 + x1y1).reshape(B, C, out_h, out_w)
print(out)

If you want a different kernel size, then you need to add the necessary xMyN=a[..., cols+M, rows+N] and add them all up.

But this workaround is very prone to errors. I would rather use any of the other solutions given, as they are more robust and will not break that easily.