Translate, rotate, center crop and resize a batch of images

I have a batch of images with shape [B, 3, H, W]. For each image in the batch, I want to translate it by a pixel location different for each image, rotate it by an angle different for each image, center crop it by its own crop size, and finally, resize them to the same size.

Currently I’m using the following code with torchvision functions affine, rotate, center_crop and resize but it’s obviously not efficient since it processes each image separately. Any idea how to run these transformations in batch mode?

# images - [B, 3, H, W]
# translation_point - [B, 2]
# angle - [B, 1]
# crop_size - [B, 1]  Assume square cropping

transformed_images = []
for i in range(images.shape[0]):
    img = affine(images[i], angle=0, scale=1.0, shear=0.0, 
                 translate=((images.shape[3]/2 - translation_point[i, 1].item()),
                            (images.shape[2]/2 - translation_point[i, 0].item())))
    img = rotate(img, angle[i, 0].item(), expand=False)
    img = center_crop(img, (crop_size[i].item(), crop_size[i].item()))
    img = resize(img, 64)
transformed_images = torch.stack(transformed_images, dim=0)


Oddly enough, using the transformation functions from kornia is 3x slower than the above code.

# images - [B, 3, H, W]
# translation_point - [B, 2]
# angle - [B, 1]
# crop_size - [B, 1]  Assume square cropping

translation_point[..., 0], translation_point[..., 1] = (images.shape[3]/2 - translation_point[..., 1]),\
                                                       (images.shape[2]/2 - translation_point[..., 0])
images = kornia.geometry.transform.translate(images, translation_point)
images = kornia.geometry.transform.rotate(images, angle)
crop_bboxes = kornia.geometry.bbox.bbox_generator(images.shape[2]/2 - crop_size/2,
                                                  images.shape[3]/2 - crop_size/2,
                                                  crop_size, crop_size)
images = kornia.geometry.transform.crop_and_resize(images, crop_bboxes, (64, 64))

Pytorch has image transforms which can be found here:

I will implement an example using those modules below. This gives an example based on all of the images provided being uniform in size. If they are not uniform in size, then you will need to iterate through them first to resize them accordingly.

import torch
from torchvision.transforms import RandomAffine, RandomRotation, CenterCrop, Resize

batch_size = 128
channels = 3
H = 224
W = 224

images = torch.rand((batch_size, channels, H, W))

images ='cuda:0')

affine = RandomAffine(degrees=0., translate=(0.1,0.1))
rotate = RandomRotation(9)
center = CenterCrop((24,24))
resize = Resize((32,32))

images = affine(images)
images = rotate(images)
images = center(images)
images = resize(images)


The speed advantage with the above is twofold. One, you will be processing the images asynchronously. Two, you’ll be able to easily send the images to a GPU for processing(i.e. .to('cuda:0').

Finally, to address the point of your per image translation. This likely won’t yield any benefit over allowing the torchvision module to provide random translations for each image. In the above, you simply specify the maximum parameters and let it sort out the rest.

You can find all this implementations and more in kornia that certainly work for batches and with automatic differentiation

I combined all transformations (translation, rotation, cropping, and resizing) into a single affine transformation and then applied it using F.affine_grid and F.grid_sample.

@billiout you mean get_affine_matrix2d ?

@edgarriba get_affine_matrix2d is not returning the affine matrix I need. I want to first translate and then rotate whereas get_affine_matrix2d is building an affine matrix that first rotates and then translates. In addition, I want my final affine matrix to be chained with the cropping and resizing operations so that I can avoid building an affined grid on my initial high-resolution images (which is expensive to generate) and instead apply all transformations on the final low-resolution resized space. I believe crop_and_resize is building an affine matrix internally that can be used for this purpose.

Once you have the affine matrix created, you can use warp_affine or call directly the F.affine_grid and F.grid_sample operations on a normalized and inverted version of the affine matrix.

1 Like

the “crop” matrix is implemented via get_perspective_transform. We noticed at some point that crop and resize was faster using slicing. I think in the augmentations module, you can choose both variants.

If you noticed the affine matrix is a composition of the others (except for translation, I’ll make a quick pr adding get_translation_matrix).

One can easily compose and chain transform based on the requerimients.