Hi, i’m trying to compute a dot product between a sliding window to an image(shaped (1024,2048)), in a way that may be resemble to convolution operation.
My goal is that each pixel score will be dependent on it environment scores. currently i came up with two ways of doing it:
import torch device = torch.device("cuda" if torch.cuda.is_available() else "cpu") kernel_size = 3 pmap = torch.rand((1024, 2048)).to(device) kernel = torch.rand((3, 3)).to(device) padding = tuple([kernel_size//2] * 4) padded_pmap = torch.nn.functional.pad(pmap, padding) # method1 patches = padded_pmap.unfold(0,kernel_size,step=1).unfold(1,kernel_size,step=1) kerenelized_pmap = (patches * kernel).sum((2,3)) # method2 kerenelized_pmap2 = torch.zeros_like(pmap) for i in range(padded_pmap.shape - kernel_size): for j in range(padded_pmap.shape - kernel_size): kerenelized_pmap2[i,j] = (padded_pmap[i:i+kernel_size,j:j+kernel_size] * kernel).sum((0,1))
the problem is that method1 uses a lot of memory and if i want to use large kernels(50+), i will exceed my device memory(and even with smaller kernels)
and method2 is very slow. My next step will be to try split the image and then use method1,but i wanted to see maybe someone already faced similar issues and has better solutions.