Separated mask_transform and image_transform on Albumentation

I was trying to implement the original UNet and use some augmentations from Albumentation library.
Since UNet expects an input of 572x572 and output of 388x388, the albumentation just merge the images and the masks into one and perform augmentation. Here is some example taken from their repo:

class OxfordPetDataset(Dataset):
def init(self, images_filenames, images_directory, masks_directory, transform=None):
self.images_filenames = images_filenames
self.images_directory = images_directory
self.masks_directory = masks_directory
self.transform = transform

albumentations_examples/pytorch_semantic_segmentation.ipynb at master · albumentations-team/albumentations_examples · GitHub

Since the transfrom recieve 2 inputs, one is image and the other is mask, when I tried to define a self.mask_transfrom = mask_transform and add to getitem some lines like below, it returned an error saying it needed image variable to start working so I guess It’s pointless since I wanted to have separate mask and image augmentation

        augmentations2 = self.mask_transform(mask=mask)
        mask = augmentations2["mask"]

Could it be possible to separate image and mask for albumentation or perhaps there are other libs?