I have the following Kernel:
{'padding': 2, 'weight': tensor([[[[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[1., 1., 1., 1., 1.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]]], device='cuda:0')}
and I have an image of size [B, 320, 256, 128]
I want to apply this kernel to each Channel (treat each channel as a 2D image and apply a standard 2D convolution)
Currently I do it like this, but it is very slow:
for b in range(0, dpv_permuted.shape[0]):
for c in range(0, dpv_permuted.shape[1]):
dpv_permuted[b,c,:,:] = F.conv2d(dpv_permuted[b,c].unsqueeze(0).unsqueeze(0), **spread_kernel).squeeze(0).squeeze(0)
How do i make this faster?