Applying equal transformation on label as on image (e.g. RandomPerspective())

ptinn · July 17, 2022, 8:36pm

When we use RandomPerspective() as part of data augmentation, I notice the transformation is applied only to the image and not the labels that are also pixel location-dependent, which results in a misalignment between the two. Consider the following:

class RandomPerspective(object):
    def __init__(self, scale=0.5, prob=0.5):
        self.perspective_transformer = T.RandomPerspective(distortion_scale=scale, p=prob)

    def __call__(self, sample):
        image, labels = sample['image'], sample['labels']
        pil_image = F.to_pil_image((image).astype(np.uint8))
        image = (np.asarray(self.perspective_transformer(pil_image)))
   
        return {'image': image, 'labels': labels}

How does one–using this RandomPerspective() , RandomHorizontalFlip() or other similar modules–apply the same transformation to shift the coordinates of the label data to align with the transformed image?

ptrblck · July 17, 2022, 9:11pm

Different approaches are discussed in e.g. this issue or this one.
My suggestion would be to use the functional transformation API, grab the parameters for each transformation, and apply it to the image and target as seen here.