Single GPU optimization

I wondered if it is possible to forward a whole image batch where all images have different dimensions (not during training, but in production). Of course the images could be zero padded, but consider having a 1x3x200x800 and a 1x3x800x200 tensor resulting in a 2x3x800x800 tensor (possible in my scenario). In this case it would probably be faster to forward the images separately. Is there a better way to do this if GPU memory is not a problem?