Crop the tensor using mask

I have a tensor in form (B, C, H, W) and range (x_min, y_min, x_max, y_max). I would like to apply a filter with the range on tensor’s dimension H and W. E.g for a tensor (1,3,500,500) and a range (100, 100, 200, 200), I would like to get the tensor result of (1,3,100:200,100:200) of shape (1, 3, 100, 100). Any ideas of how to achieve this? What I have tried is to use reduce function and make a mask to filter out the out-of-range pixels.

mask = reduce(torch.logical_and, (img[:,:, ?,:] >= range[1],
                                         img[:,:, ?,:] < range[3],
                                         img[:,:, :,?]  >= range[0],
                                         img[:,:, :,?] < range[2]))

But the issue is I don’t know what to put at the position of “?” so that it compares the value at that dimension to the range value.

Is this what you mean?


def crop_img(img, x_tuple, y_tuple):
    return img[:,:,x_tuple[0]:x_tuple[1], y_tuple[0], y_tuple[1]]

x = torch.rand(1, 3, 500, 500)
x_tuple=(100, 200)
y_tuple=(100, 200)
print(crop_img(x, x_tuple, y_tuple).size())

I think this works. But I forgot to mention that my range tuple is also in tensor format (which has shape of (B, 4), where B is number of image in batch and 4 is (x_min, y_min, x_max, y_max) since each image might have different crop range. Is there a convenient way to apply this range tensor to the image tensor (B, C, H, W) and crop accordingly or I have to loop each range tensor and apply your slicing method to corresponding image?

If we assume the output sizes will be the same on dim=(2,3) then you could do:

cropped_imgs = torch.stack([img_batch[ i , : , map[i,0] : map[i,1] , map[i,2] : map[i,3]] for i in range(map.size(0))])

Note: wrote this on my phone, so it might be missing a bracket or have a typo.

Alright thank you! I will take the loop solution for now. I still have a question on cropping. Let’s say if we already know a range, for example, (500,500) and I have an image of (496,504), if I want to use a mask to padding missing pixels and crop additional pixels so that reshape the image to (500,500). Is there anyway to do this by mask?

You could use a combination of crop and pad from torchvision:

https://pytorch.org/vision/stable/transforms.html