I’m extracting patches out of a `Tensor` with the `unfold` method. Consider the following snippet:

``````import torch

torch.manual_seed(0)

numel = 10
length = 3
stride = 2

target = torch.zeros(numel,)

input_patches = input.unfold(0, length, stride)
target_patches = target.unfold(0, length, stride)
``````

In the subsequent processing I calculate some loss between the `input_patches` and `target_patches`:

``````loss = torch.sum((input_patches - target_patches) ** 2.0)
loss.backward()
``````

With this some elements of `input` receive a higher `grad`ient than others, since they are included in multiple patches:

``````tensor([2., 2., 4., 2., 4., 2., 4., 2., 2., 0.])
``````

Within my application this is unwanted. I can calculate the number of times each element was used as follows:

``````counts = torch.zeros(numel)
count_patch = torch.ones(length)

for idx in range(0, numel - length + 1, stride):

print(counts)
print(input.grad / torch.clamp(counts, min=1.0)) # clamp to avoid zero division
``````

This results in ‘stable’ `grad`ients

``````tensor([1., 1., 2., 1., 2., 1., 2., 1., 1., 0.])
tensor([2., 2., 2., 2., 2., 2., 2., 2., 2., 0.])
``````

which is what I need.

1. While this example is in 1D I actually need 2D data. I have not tested it yet, but I imagine the needed double `for` loop is costly. Is there a better / faster / more efficient way to achieve the same result with builtins from `torch`?
2. My idea to implement this properly is to define an `autograd.Function`, as can be seen below, and apply this to the input before I extract the patches. Do you think this is a proper way to do this?
``````class _PatchNormalizer(torch.autograd.Function):
@staticmethod
def forward(ctx, input, length, stride):
counts = _PatchNormalizer.get_patch_counts(input, length, stride)
ctx.save_for_backward(counts)
return input

@staticmethod
counts, = ctx.saved_tensors