Vectorizing Custom Sliding Window

Hi, I need help in vectorizing this code. Here, E is (B, 8, 256, 256) vector. K is 7 (knn’s K nearest neighbor). Basically, I want to slide a window on E, and for each pixel find the K nearest pixel vectors in the corresponding window. Right now I have two loops for doing the same procedure, but it takes a very long time.

Z = torch.zeros_like(E)
        Z = repeat(Z, 'b c h w -> b (repeat c) h w', repeat=self.K)
        
        ### Vectorize this code
        for i in range(H):
            for j in range(W):
                query = E[:, :, i, j] # B, C, 1, 1
                
                tl_x = max(0, i - self.patch_size)
                tl_y = max(0, j - self.patch_size)
                br_x = min(H, i + self.patch_size)
                br_y = min(W, j + self.patch_size)
                
                neighbor_patch = E[:, :, tl_x:br_x+1, tl_y:br_y+1] # B, C, Ph, Pw
                
                query = rearrange(query, 'b c -> b 1 c')
                neighbor_patch = rearrange(neighbor_patch, 'b c h w -> b (h w) c')
                
                distance = -torch.cdist(query, neighbor_patch) # b 1 hw
                distance_softmax = F.softmax(distance, dim=-1)
                
                _, indices = torch.topk(distance_softmax, dim=-1, k=self.K + 1) # b 1 k
                indices = repeat(indices.squeeze(1), 'b k -> b k c', c=C)
                neighbors = neighbor_patch.gather(dim=1, index=indices) # b k c
                neighbors = neighbors[:, :, 1:].reshape(B, -1)
                
                Z[:, :, i, j] = neighbors

I tried using F.unfold for doing the vectorizing, but I am having difficulty implementing it.