Augment image and bounding box lazily

I want to perform image augmentations lazily (during model training). This however affects only the images itself, not the bounding box (defined by 2 coordinates). Imgaug could do just that, but it only works preemptively and therefore needs massive amounts of memory. Any idea how to integrate the bounding box augmentation into the forward pass?

The detection models in torchvision use the resize_keypoints and resize_boxes method as seen here so you might be able to reuse these (and potentially other detection transformations).