How to create trace matrix for im2col (unfold)

Hi there,

I want to create a trace matrix for the classic im2col function. Given a 4D tensor [N, C, H, W] and an unfold function with kernel size (k*k) which can generate L patches. What I want is a trace matrix [N, L, H, W] where the value is set to 1 if the pixel lies in the current patch, otherwise it will be set to 0.

A rough instance:
input (2D):

[[1, 2, 3, 4],
 [5, 6, 7, 8]]

kernel size: 2x2, stride: 1

output matrix

  1  2  3  4  5  6  7  8  (flatten idx)
[[1, 1, 0, 0, 1, 1, 0, 0],
 [0, 1, 1, 0, 0, 1, 1, 0],
 [0, 0, 1, 1, 0, 0, 1, 1]]

Is it possible to implement this without writing a CUDA module?