I have a 6-dim tensor say [1,1,4012,6034,11,11], how can I apply the convolution along the last two dims? Excepting the unfold operation which will cause OOM
I assume this means you want to use a 2D convolution and treat the last two dims of your input as spatial sizes?
If so, what would the other dimensions represent and how would you treat them?
nn.ConvXD layer expects an input in
[batch_size, channels, *dims] where
*dims would correspond to 2 spatial dimensions (height and width), 3 volumetric dimensions (depth, height, width), etc.
In your use case it seems the spatial dimensions are fixed while other (undefined) dimensions are added, so could you explain your use case a bit more?
Thank you for your reply. What I want to do is to split a tensor into sliding windows, then apply the window sum in these sliding windows.
For example, I unfold an [1,1,4022,6044] image as [1,1,4012,6034,11,11], then in each 11x11 window, I want to apply summation on 3x3 window, although I can further unfold the last two dims, the OOM will occur.
If I understand your use case correctly, you want to apply a 3x3 convolution
to each of the 11x11 patches spanned by your last two dimensions.
If this is right, you can use
reshape()) to push your two large
dimensions into the “batch” dimension (keeping a singleton “channels”
dimension) and then apply an ordinary
>>> import torch >>> torch.__version__ '1.12.0' >>> _ = torch.manual_seed (2022) >>> conv = torch.nn.Conv2d (1, 1, 3, padding = 1) >>> t = torch.randn (1, 1, 412, 634, 11, 11) # smaller tensor for this example >>> u = conv (t.view (261208, 1, 11, 11)).view (1, 1, 412, 634, 11, 11) >>> u.shape torch.Size([1, 1, 412, 634, 11, 11])
I’m pretty sure that for large tensors
Conv2d is smart enough to use a
algorithm more memory-efficient than calling
Excellent!! This should work in my case, thank you