When we use RandomPerspective() as part of data augmentation, I notice the transformation is applied only to the image and not the labels that are also pixel location-dependent, which results in a misalignment between the two. Consider the following:
class RandomPerspective(object):
def __init__(self, scale=0.5, prob=0.5):
self.perspective_transformer = T.RandomPerspective(distortion_scale=scale, p=prob)
def __call__(self, sample):
image, labels = sample['image'], sample['labels']
pil_image = F.to_pil_image((image).astype(np.uint8))
image = (np.asarray(self.perspective_transformer(pil_image)))
return {'image': image, 'labels': labels}
How does one–using this RandomPerspective()
, RandomHorizontalFlip()
or other similar modules–apply the same transformation to shift the coordinates of the label data to align with the transformed image?