Does `as_strided()` copy data?

pclucas · September 4, 2018, 10:11pm

Hi,

I’m experimenting with torch.einsum and torch.as_strided to implement a convolution. Right now, my implementation uses approximately 6 times more memory than F.conv2d. I was wondering if the added memory consumption is from torch.as_strided copying data, or simply because my implementation is not as optimized as the CUDA kernel behind F.conv2d.

Also, I could not find documentation for as_strided, is it available somewhere ?

Thanks, Lucas

pclucas · September 4, 2018, 10:12pm

Here’s my implementation just in case

class einsum_conv(nn.Module):
    def __init__(self, kernel_size):
        super(einsum_conv, self).__init__()
        self.ks = kernel_size

    def forward(self, x, kernel):
        if len(x.size()) == 3:
            x = x.unsqueeze(0)

        assert len(x.size()) == 4, 'need bs x c x h x w format'

        bs, in_c, h, w = x.size()
        ks = self.ks
        strided_x = x.as_strided((bs, in_c, h - ks + 1, w - ks + 1, ks, ks),
                                 (h * w * in_c, h * w, w, 1, w, 1))

        out = torch.einsum('bihwkl,oikl->bohw', (strided_x, kernel))
        return out

albanD · September 5, 2018, 7:37am

Hi,

As strided does not copy any data.
The difference in memory usage might come from the fact that more intermediate results are used. Special care was taken for the Conv operation to reduce the number of intermediary results as much as possible.

pclucas · September 5, 2018, 5:18pm

Hi,

thanks for the answer!

-Lucas

SimonW · September 5, 2018, 5:46pm

It’s probably einsum that’s using that much memory here. Really if you want to implement a padding-less conv, you can use as_strided + matmul (gemm). It should be a lot faster.

tom · September 5, 2018, 7:19pm

In more detail: As Simon says, einsum collapses the dimensions of the factors internally to reduce to bmm. If that cannot be done with a view, it will do a reshape and thus have a copy of the factor.

Best regards

Thomas