How does `torchvision.datasets.ImageFolder()` make images coherent with pretrained CNNs?

Hi everybody,
I retrained SqueezeNet with greyscale 128 by 128 images, only using the torchvision.transforms.ToTensor() transform, via torchvision.datasets.ImageFolder(). It yielded no error and training went fine, although the original network should only accept RGB 224 by 224 images.
However, when I try to load a single image through PIL in order to classify it with the model I trained previously, I get a dimension mismatch error (Invalid dimensions for image data). How does exactly torchvision.datasets.ImageFolder() does in order to accommodate my images into the preexisting architecture? How can I reproduce the same transformations torchvision.datasets.ImageFolder() does to data, in order to have a successful isolated classification?

Edit: I figured out, going through torchvision.datasets.ImageFolder() code, that by default, PIL is used to load the images and it is automatically converted to RGB, by using img = and img.convert('RGB'). However, I still do not know how the difference between the spatial resolutions is handled.