Proper use of unfold?

Consider the following 6x6 tensor.

tst_tensor = tensor([[1, 1, 1, 2, 3, 1],
        [1, 2, 0, 0, 0, 0],
        [1, 0, 1, 0, 0, 0],
        [5, 5, 5, 5, 5, 5],
        [5, 5, 5, 0, 0, 0],
        [0, 2, 3, 0, 0, 0]])

I’d like to window it in 2x2 windows (or 4 x 1) as follows

tensor([[1, 1, 1, 2],
        [1, 2, 0, 0],
        [3,1, 0, 0] ...
        [0,0,0,0]])

However if I run the following code

batched_input= tst_tensor.unsqueeze(0).unsqueeze(0)
 unfold = nn.Unfold(kernel_size=(3,3),stride=3,dilation=1)
 windows = unfold(batched_input)

I get something of the right shape (1,9,4), but the values are not what I expect:

tensor([[[1., 2., 5., 5.],
         [1., 3., 5., 5.],
         [1., 1., 5., 5.],
         [1., 0., 5., 0.],
         [2., 0., 5., 0.],
         [0., 0., 5., 0.],
         [1., 0., 0., 0.],
         [0., 0., 2., 0.],
         [1., 0., 3., 0.]]])

Where am I going wrong?

I’m not sure, how the example result would fit into a [4, 1] window, but you could try to use tensor.unfold, which will not collapse the patches into a dimension:

tst_tensor = torch.tensor([[1, 1, 1, 2, 3, 1],
                        [1, 2, 0, 0, 0, 0],
                        [1, 0, 1, 0, 0, 0],
                        [5, 5, 5, 5, 5, 5],
                        [5, 5, 5, 0, 0, 0],
                        [0, 2, 3, 0, 0, 0]])

patches = tst_tensor.unfold(1, 4, 2)

That’s definitely a lot closer than I’ve gotten! To give some background, the reason I’m trying to do this is to “shrink” a larger matrix by cutting it up into windows and then taking the mode of these windows. Similar to how you would interpolate an image, but by taking the mode instead of some mean or nearest neighbor.

What I meant by 4x1 window was just to have the windows flattened so torch.mode would work. A 2x2 matrix would work too but I haven’t figured out how to do it. Hopefully the example below makes it clearer:

# Each last dimension is made up of the 4 elements in a 2x2 window with no overlap
# between the windows. 
#There should be 9 of these windows for this particular example.

# Shape: 1x 9 (num_windows) x 2 (windows_size_x) x 2 (window_size_y)
the_dream = torch.tensor([[[1, 1], [1, 2]] ,
                       [ [3, 1],[ 0 , 0]], ...
                        [[5, 0], [3, 0]],
                        [[0, 0], [0, 0]]])

Have a look at this code.

1 Like

For getting tiles, manually doing this is probably most efficient:

windows = tst_tensor.view(3, 2, 3, 2).permute(0, 2, 1, 3).reshape(1, 9, 4)

Best regards

Thomas

2 Likes

@tom Thank you for your reply!

If anyone else finds it useful, below is a function that implements what Thomas suggested in a more general way. It behaves the same way the original did (compared using a dataset of 480 x 640 images) AFAIK.

def fastCreateFeatureMap(masks, num_x_windows = 48, num_y_windows = 64):
    window_size_x = int(masks.size(0)/num_x_windows)
    window_size_y = int(masks.size(1)/num_y_windows)
    windows = masks.view(num_x_windows, window_size_x, num_y_windows, window_size_y).permute(0, 2, 1, 3).reshape(num_y_windows *num_x_windows , window_size_y * window_size_x)
    processed_windows = torch.mode(windows,dim=-1)[0]
    return processed_windows.reshape(num_x_windows,num_y_windows)