Then I thought to do a preprocess step, save all the preprocessed images as .jpg with torchvision.utils.save_image and use these preprocessed images as Dataset for training.
There is an actual speedup, but I noticed that save_image change the images, in practice if I 1) Open the original image (with PIL), 2) transform the image 3) save the transformed image 4) open the transformed image (with PIL), the tensors in 2) and 4) are different (norm of the difference is of the order of 155).
This has at least 3 side effects:
I am training on a Dataset that is not what I expect, that is original image+preprocessing;
When I subsequently classify some images I have load them, preprocess, save and reopen, because …
… I do not understand what happens, why 2) and 4) are different.
I tried to remove the normalization T.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225]) and 2) and 4) are almost equal (norm of the difference is roughly 1.5), but still the tensors are different.
Am I doing something completely wrong trying to save the preprocessed images? If not, what is the proper way to do it?
Do you know why even reverting the normalization the tensors are different?
P.S. saving the tensor directly is ok but the occupied space is roughly the same as the original 18.000 dataset, while the resized and cropped preprocessed jpg occupies much less.
If you are asking why the saved and loaded tensors are different, that is because JPEG compression is lossy (even with the maximum quality level) due to the nature of the DCT-based compression it uses.
Could you post some more details about your setup (e.g., CPU resources) in case are you fully utilizing the CPU already for image preprocessing?
Thanks @eqy for your answer. You’re right, I tried saving the images (removing the normalization) as png and the tensors are equal.
Anyway, the training is fine even if I preprocess the images at each epoch, I only noticed that saving the preprocessed image could result in a time optimization.
As far as I get I cannot save directly the preprocessed image (without removing the normalization before) because there is some information loss even with a lossless compression. I do not know exactly why, but I imagine that on adjusting from float → [0,255] something goes lost ( I noticed that negative floats are put to zero, floats > 1 are put to 1). So I have to conclude that I cannot avoid some image preprocessing during the training, is that correct?
Yes, some rounding would occur and 32-bit floating point has >> 8 bits of mantissa, meaning that far greater than 255 values could exist between [0,1].
I would not say you must preprocess during training, but in practice I do not see it done for common vision models as the preprocessing typically includes some random data augmentations that wouldn’t really make sense to persist to storage. If your preprocessing only creates a constant size amount of data, then it could make sense, and the rounding differences you see may not be significant in terms of impact on the model quality. However, you would have to be careful that during inference or deployment that you are using the same preprocessing steps (the inference data should also be rounded/quantized if this was done for the training data).