I want to do modify the shape of the convolution layer so that it doesn’t just calculate the up and down direction. It should be something like the deformable convolution but with less flexibility.
For example, I may want to implement a 2D Conv with shape like this:

Yes, they are the kernel layouts.
What do you mean ‘zero out their gradients after each backward call’? Do you mean multiplying a mask to the weights at each iteration?

If you have the mask for the current pattern, you could use it to zero out the initial randomly initialized weight matrix as well as the populated gradients after the backward() call.
Each backward will accumulate the gradients in the param.grad attribute.