Variable sized batch as input while training a CV model

Is there a way to create a batch of images with different sizes during the training of the model?
I know one way is to input image one by one and then accumulate the gradients and then then back propagate.
But I wanted to know if it was possible to create this batch directly so that I can just call model(batch) where the batch contains the images with different height and widths

You can look for torchvision.transforms. Torchvision.transforms can be used to batch resize images with different heights and widths, rotate and flip etc