Image rotation or image flip and knowledge of new location of certain pixels

working with a facial keypoint dataset, i wan’t to augment the data by rotating and random flipping. for that i need to also infer the new location of the keypoints.

i’m trying to build a transform by myself but got confused since most transforms that i know and use do not expect to also modify the labels.

what is right way to approach this issue? and is there any easy implementation running around somewhere?

there is a lot of confusion online about this subject and i think it;s useful if someone could post one tutorial on the subject.
anyway, i saw some people just pass a mask and convert it. i tried to implement it myself by first taking the coordinates vector and turning it into a sparse matrix (same size as image, where all is zero except location of keypoints), then making it go through same transformations, then another tranformation to update the targets with the ones aquired by the mask.

the question remains, how do i know what is the input and output of each transform?
let’s say i modified my dataset class getitem function to return a dictionary with image, labels and mask. (labels being the coordinates of the original picture and mask being a sparse matrix with those coordinates as ones and rest are zeros) how do i now pass this dataset together with a compose object and make sure that transforms are handling this dictionary correctly?

This tutorial is quite old by now and written for Theano + Lasagne, however you could reuse the approach for the augmentation of the keypoints as it walks you through it step by step.

If you want to apply the same “random” transformation of two different image tensors, you could use the functional API of torchvision.transforms as described here.

i followed your advice and performed augmentation manually inside the getitem function of dataset class.
still i was confused how to perform validation split because i supply the random_split function with the custom dataset class that as i said already performs augmentation.

further, just to check for other errors, i tried to run the program with both validation and and train being augmented.*** there is a strange error occuring after TF.rotate(mask) that adds non zeros to the mask. i can’t find the exact location but it’s somewhere in the transform to pil or roatate.***

import os
import numpy as np
from import read_csv
from sklearn.utils import shuffle
import torch
from import Dataset
import torchvision.transforms.functional as TF

class FacialKeypoints(Dataset):

    def __init__(self, test=False, cols=None,FTRAIN = 'data/Q3/training.csv', FTEST = 'EX1/Q3/test.csv', transform_vars=None):
        fname = FTEST if test else FTRAIN
        df = read_csv(os.path.expanduser(fname))  # load pandas dataframe

        # The Image column has pixel values separated by space; convert
        # the values to numpy arrays:
        df['Image'] = df['Image'].apply(lambda im: np.fromstring(im, sep=' '))

        if cols:  # get a subset of columns
            df = df[list(cols) + ['Image']]

        print('number of values in each column: ', df.count())  # prints the number of values for each column
        df = df.dropna()  # drop all rows that have missing values in them
        X = np.vstack(df['Image'].values) / 255.  # scale pixel values to [0, 1]
        X = X.astype(np.float32)
        image_size = int(np.sqrt(X.shape[1]))
        Y = []
        if not test:  # only FTRAIN has any target columns
            y = df[df.columns[:-1]].values
            y2 = y.reshape(y.shape[0],15,2)
            for coords in y2:
                mask = np.zeros((image_size,image_size))
                for pair in coords:
                    pair = pair.round().astype(int)
            Y = np.array(Y)
            y = (y - 48) / 48  # scale target coordinates to [-1, 1]
            X, y, Y = shuffle(X, y, Y, random_state=42)  # shuffle train data
            y = y.astype(np.float32)
            y = None

        self.X = torch.tensor(X,dtype=torch.float32)
        self.transform_vars = transform_vars
        self.y = torch.tensor(y)
        self.Y = torch.tensor(Y,dtype=torch.float32)
        print('finished loading')
    def __len__(self):
        return len(self.X)

    def transform(self,image, mask):
        image = image.reshape(96,96)
        flip_prob = self.transform_vars['flip_probability']
        rotate_prob = self.transform_vars['rotate_probability']

        print('before',torch.nonzero(mask, as_tuple=False).reshape(-1).shape[0])
        if torch.rand(1)>flip_prob:
            image = TF.hflip(image)
            mask = TF.hflip(mask)
        if torch.rand(1)<rotate_prob:
            avg_pixel = image.mean()
            degrees = self.transform_vars['degrees']
            deg = int(torch.rand(1).item() * degrees - degrees)
            image_r = TF.to_tensor(TF.rotate(TF.to_pil_image(image),deg)).squeeze()
            image_r[(image_r==0) * (image!=0)] = avg_pixel
            image = image_r
            mask = TF.to_pil_image(mask)
            print('after pil', mask.ImageStat.sum)
            mask = TF.rotate(mask, deg)
            print('after rotate', mask.ImageStat.sum)
            mask = TF.to_tensor(mask).squeeze()
            #mask = TF.to_tensor(TF.rotate(TF.to_pil_image(mask), deg)).squeeze()
        print('after tensor',torch.nonzero(mask, as_tuple=False).reshape(-1).shape[0])
        return image, mask

    def update_target(self,mask):
        keypoints = torch.nonzero(mask,as_tuple=False).reshape(-1)
        keypoints = torch.from_numpy((keypoints.numpy() - 48) / 48)
        return keypoints

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()
        image = self.X[idx]
        keypoints = self.y[idx]
        mask = self.Y[idx]
        if self.transform_vars['is']:
            image, mask = self.transform(image, mask)
            keypoints = self.update_target(mask)
            return {'image':image, 'keypoints':keypoints}
            return {'image':image,'keypoints':keypoints}

this is the dataset class that loads the data, turns image string to values, removes nans and forms a “mask” which is a matrix of zeros and ones where there should be a facial keypoint. in the augmentation part, both the image and the mask go through the same transformations and then the mask goes through one more transformation (updateKeypoint) to become a vector of size 30 which is the target.

the main script:

import torch
import torch.nn as nn
from import DataLoader
from preprocess import FacialKeypoints
import numpy as np
from import random_split
transformed_dataset = FacialKeypoints(transform_vars={'is':True,'degrees':20,'flip_probability':0.5,'rotate_probability':0.8})
num_train = int(np.ceil(len(transformed_dataset) * 0.85))
num_val = int(len(transformed_dataset) - num_train)
batch_size = 16
trainset,valset = random_split(transformed_dataset,[num_train,num_val])
trainloader = DataLoader(trainset, batch_size=batch_size,
                        shuffle=True, num_workers=0)
valoader = DataLoader(valset, batch_size=batch_size,
                        shuffle=True, num_workers=0)

device = torch.device('cuda')

model2 = nn.Sequential(
total_loss = {'train':[],'val':[]}
criterion1 = nn.MSELoss()
criterion2 = nn.MSELoss()
optimizer2 = torch.optim.Adam(model2.parameters(),lr=0.001)

total_loss = {'train':[],'val':[]}
for epoch in range(100):
    print('in epoch {}/100 :'.format(epoch+1))
    for sample in trainloader:
        losses = []
        input = sample['image'].to(device)
        batch = input.shape[0]
        target = sample['keypoints'].to(device)
        output = model2(input)
        loss2 = criterion2(output,target)
    a = np.sum(losses)
    print('train loss = {}'.format(a))
    for sample in valoader:
        with torch.no_grad():
            losses = []
            input = sample['image'].to(device)
            batch = input.shape[0]
            input = input.view([batch, 1, 96, 96])
            target = sample['keypoints'].to(device)
            output = model2(input)
            loss2 = criterion2(output, target)
    a = np.sum(losses)

'''def check_sample(loader=valoader,model=model2,device=device):
device2 = torch.device('cpu')
plots = 16//3
x = next(iter(loader))
y_true = x['keypoints']
y_true = y_true.reshape(16,15,2)
x = x['image'].to(device)
x = x.view(16,1,96,96)
y = model(x)
y = y.reshape(16,15,2).to(device2)
x =

fig,ax = plt.subplots(3,plots)
for i in range(plots):
    for j in range(3):

finally i get a weird error about the size of batch, which i suspect is because of the target changing size.

line 55, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [30] at entry 0 and [26] at entry 3

would really appreciate any hint what could cause this error, and and also how to augment the data only for training and not for validation

i pinpointed the problem but still not sure how to solve it.

also not sure if it’s possible to reproduce without the data.

still, i have a 2d tensor size: (96,96) with all elements 0’s except for 15 containing 1’s.
let’s call this tensor mask. so:

>>> import torch
import torchvision.transforms.functional as TF
deg = 20
tensor([[27, 38],
        [27, 52],
        [28, 14],
        [29, 80],
        [36, 28],
        [36, 65],
        [37, 20],
        [37, 36],
        [37, 58],
        [37, 73],
        [59, 47],
        [71, 29],
        [71, 46],
        [71, 64],
        [82, 46]])
>>>mask = TF.to_pil_image(mask)
mask = TF.rotate(mask, deg)
mask = TF.to_tensor(mask).squeeze()
tensor([[19, 72],
        [27, 45],
        [29, 68],
        [31, 60],
        [34, 54],
        [41,  9],
        [42, 33],
        [43, 25],
        [47, 18],
        [58, 51],
        [64, 71],
        [70, 54],
        [76, 38],
        [80, 58]])

if you count, notice that there were 15 non zero elements prior to transform and 14 after transform.
this is random and sometimes leads to less elements sometime to more elements, sometimes exactly to the desired 15 elements. of course the randomness might be caused by the random generator of degrees i didn’t mention in this comment

I think you might lose values due to the applied interpolation in TF.rotate.
Could you set resample to PIL.Image.NEAREST for this method and check, if your output would contain all points?

If you are lazily loading the data, just create two datasets. One with the training transformation, the other one with the validation transformation, and wrap both datasets in Subsets using the randomly split data indices.

PIL.Image.NEAREST doesn’t seem to solve the problem.
i tried

    mask = TF.rotate(mask, deg,resample=PIL.Image.NEAREST )

and also:

    mask = TF.rotate(mask, deg,resample=PIL.Image.NEAREST ,fill=0)

still getting weird interpolations. what is more strange is that for some runs i’m getting more pixels than i should.
what i mean is that i count the non-zero elements before and after transform and getting more non-zero after tranform for some pictures, less non-zero for others, and also sometimes getting the right amount.

what is more surprising is the fact i don’t see anyone else having trouble with this. is there an easy solve-around?

This might be indeed expected, if e.g. a single input pixel with a non-zero value “lands” on multiple output pixel locations after the rotation. I assumed that the nearest interpolation would make sure to select a single output location, but this doesn’t seem to be the case.

E.g. if you have 4 single pixels with a non-zero value in the input image creating the edges of a rectangle, the rotated image would create a rotated rectangle, but each edge might be bigger or smaller.

For your use case you could thus use the keypoint coordinates directly instead of rotating an image.

yes, in despair i turned to implementing it myself. still i run into the same error, maybe you can check out my code and see if something pops to your eye?

it’s basically a function that takes the pair from iterable of torch.nonzero(as_tuple=False) and returns a tilted pair:

from math import atan2, cos, sin ,radians
import numpy as np
def tilt(pair,deg):
    deg = radians(deg)
    x = (pair[1]-48)/48
    y = (pair[0]-48)/48
    angle = atan2(y,x) - deg
    size = np.sqrt( x ** 2 + y ** 2 )
    temp = size * cos(angle)
    x = temp if ((temp > -1) & (temp < 1)) else x
    temp = size * sin(angle)
    y = temp if ((temp > -1) & (temp < 1)) else y
    x = int((x * 48 + 48).round())
    y = int((y * 48 + 48).round())
    return (y,x)

generally, i’m sure this issue will repeat itself and should be addressed

I haven’t verified your code, but this general approach should work.
This example shows how to use a rotation matrix and might be useful in case you want to extend your use case.

I’m not sure which issue you are referring to, but if you think it’s an issue that single pixels might be “blurred” in the rotated image, i.e. they would spread to more than a single pixel location, I think this is rather a known image processing “issue”.