How to crop image tensor in model

Hi all,
I am a beginner of pytorch, and I am trying to implement a complex CNN model called FEC-CNN from paper “A Fully End-to-End Cascaded CNN for Facial Landmark Detection”. However, I met some problem while building it.

Here is the architecture of FEC-CNN:

And here is the architecture of a single sub-CNN:

Explaining the model a bit:
The input of FEC-CNN model is face images, and the output is 68 landmarks of those images.
First, an initial CNN model will predict the initial 68 landmarks of the image, and those landmark will be refined by the following sub-CNNs each of which takes patches around landmarks predicted by previous stage as input.

There are my questions:
1, How to crop image tensor based on the output from the CNN model? I know that it is very convenient for pytorch to convert tensor into numpy, but is it correct for me to convert tensor into numpy for such processing and then convert it back to tensor for the next model input? I want the gradients can be computed from the start to the very end.
2, Is it possible for me to set a batch size, since I have to crop every image based on different landmarks predicted by models? It seems to me that models will handle inputs batch size by batch size, so I have no idea how to crop every image with different cropping positions per batch size but using batch size = 1.

Sorry for not giving any codes, I have been stuck with these problems for a week.
Thank you for any answer, or possible hint.

1 Like

Currently there doesn’t seem to be a function that can crop the tensor in PyTorch.
The only possible way that i can think of is converting it to PILImage and then cropping it.

For the second one :
Is it possible to use Dataloader on the landmarks , and then set the batch size.
Basically cropping the images from the CNN ouput , save it and then put it in a Dataloader and set the batch size as required.

Please correct me if wrong.

1 Like

You can crop in a fully differentiable way with indexing i.e.

import torch
from torch.autograd import Variable

image = Variable(torch.random(256, 256))

# Crop a 128 x 128 subimage in the top left corner
cropped_image = image[0:128, 0:128]

Since you have different landmarks for each image, you would have to first split your batch into single images, then crop, and then concatenate.

8 Likes

Will my method also work ?

Thanks a lot!

I don’t know we can crop Variable just like the way in numpy.

And I will also try the Dataloader.

It wouldn’t be differentiable because dataloader uses PIL for cropping or at least it did the last time I checked.

So I think I can convert the output into numpy and use it to crop the Variable in models.
It will not break the whole gradients, right?

You will stop gradient propagation any time you switch to numpy.

Oh yeah right! Thanks!

From Data Loading and Processing Tutorial
I transposed my numpy image (64 * 64 * 3) to torch image (3 * 64 * 64)
How can I crop with indexing?
I have tried torch.transpose(), but I am confused about the input parameter dim0 and dim1.

Oh, I get it!
Just need to use it twice,

x = torch.transpose(x,0,2)
x = torch.transpose(x,0,1)

and I will get what I want!

Are there any way to crop a mini-batch images (i.e. a tensor in shape [N, C, H, W]) with a mini-batch bounding boxes (i.e. a tensor in shape [N, 4])?

If the bounding boxes have the same values, its easy.

If the bounding boxes have the same wid and heights, at least, loop would be ok.

If the bounding boxes have different values, with different wid and heights, loop + collate_fn??

I found another solution from STN: using affine transform to do cropping. See discussion here Cropping a minibatch of images, each image a bit differently

Thanks for the link.

But, could I use this with different crop-sizes?

input  – input batch of images (N x C x H_i x W_i)
grid   – flow-field of size (N x H_o x W_o x 2)

Is it only for the fixed crop-size? Or in other words, only for bs=1?


# copy some codes from 
# https://discuss.pytorch.org/t/cropping-a-minibatch-of-images-each-image-a-bit-differently/12247/5 
###############

import cv2
import torch
import numpy as np

#############
def build_grid(bs, source_hgh,source_wid,  target_y0, target_x0, target_hgh, target_wid):
    grid_h = (torch.linspace(target_y0[0], target_y0[0]+target_hgh, steps = target_hgh)*2.0/source_hgh-1.0).unsqueeze(-1).repeat(1,target_wid).unsqueeze(-1)
    grid_w = (torch.linspace(target_x0[0], target_x0[0]+target_wid, steps = target_wid)*2.0/source_wid-1.0).unsqueeze(0).repeat(target_hgh,1).unsqueeze(-1)
    grid   = torch.cat([grid_w,grid_h],dim=2).unsqueeze(0)
    if bs > 1:
        for i in range(1, bs):
            grid_h = (torch.linspace(target_y0[i], target_y0[i]+target_hgh, steps = target_hgh)*2.0/source_hgh-1.0).unsqueeze(-1).repeat(1,target_wid).unsqueeze(-1)
            grid_w = (torch.linspace(target_x0[i], target_x0[i]+target_wid, steps = target_wid)*2.0/source_wid-1.0).unsqueeze(0).repeat(target_hgh,1).unsqueeze(-1)
            wh   = torch.cat([grid_w, grid_h], dim = 2).unsqueeze(0)
            grid = torch.cat([grid, wh], dim = 0)
    return grid #grid.cuda()

############## 
bs = 2
source_hgh = 100
source_wid = 120
target_hgh = 50
target_wid = 60
target_y0 = torch.tensor([0, 20])
target_x0 = torch.tensor([10, 0])
############## 
img0 = cv2.imread('1.jpg')
img1 = cv2.imread('2.jpg')
m0 = img0[:source_hgh, :source_wid, :]
m1 = img1[:source_hgh, :source_wid, :]
cv2.imwrite('m0.jpg', m0)
cv2.imwrite('m1.jpg', m1)
n0 = torch.from_numpy(np.transpose(m0, (2, 0,1)))
n1 = torch.from_numpy(np.transpose(m1, (2, 0,1)))


t = torch.zeros(bs, 3, source_hgh, source_wid)
t[0] = n0
t[1] = n1

grid = build_grid(bs, source_hgh,source_wid, target_y0, target_x0, target_hgh, target_wid )
crp  = torch.nn.functional.grid_sample(t, grid)

r0 = crp[0].numpy()
r1 = crp[1].numpy()
r0 = np.transpose(r0, (1,2,0))
r1 = np.transpose(r1, (1,2,0))
cv2.imwrite('r0.jpg', r0)
cv2.imwrite('r1.jpg', r1)

#######
OK, I see. With F.affine_grid(theta, size), the bboxes could have different sizes and locations. But the outputs would be resized.

import cv2
import torch
import numpy as np

##############
bs=2
source_width=100
source_height=120
output_width=70
output_height=60
theta=torch.zeros(bs,2,3)
target_y0=torch.tensor([0, 20],dtype=torch.float)
target_x0=torch.tensor([10, 0],dtype=torch.float)
target_y1=torch.tensor([60, 80],dtype=torch.float)
target_x1=torch.tensor([90, 60],dtype=torch.float)

theta[:, 0, 0] = (target_x1 - target_x0) / (source_width - 1)
theta[:, 0 ,2] = (target_x1 + target_x0 - source_width + 1) / (source_width - 1)
theta[:, 1, 1] = (target_y1 - target_y0) / (source_height - 1)
theta[:, 1, 2] = (target_y1 + target_y0 - source_height + 1) / (source_height - 1)
grid= torch.nn.functional.affine_grid(theta, (bs,3,output_height,output_width ))

###############
img0=cv2.imread('1.png')
img1=cv2.imread('2.png')
m0=img0[:source_height, :source_width, :]
m1=img1[:source_height, :source_width, :]
cv2.imwrite('m0.jpg', m0)
cv2.imwrite('m1.jpg', m1)
n1=torch.from_numpy(np.transpose(m1,(2, 0,1)))
n0=torch.from_numpy(np.transpose(m0,(2, 0,1)))

t=torch.zeros(bs, 3, source_height, source_width)
t[0]=n0
t[1]=n1

##############
crp=torch.nn.functional.grid_sample(t, grid)

r0=crp[0].numpy()
r1=crp[1].numpy()
r0=np.transpose(r0,(1,2,0))
r1=np.transpose(r1,(1,2,0))
cv2.imwrite('r0.jpg', r0)
cv2.imwrite('r1.jpg', r1)