Hi everyone,
I’m trying to understand the reasoning behind the scaling of PIL 8-bit images from the range [0,255] to [0,1]. I am aware of whitening techniques and assume that this practice is something similar or aims to accomplish the same goal, but in practice this scaling feels counter productive.
I don’t have any rigorous understanding or backing for this, but when I’m training models, reconstruction losses like MSE always seem to work better with [0,255] images. Intuitively, I assumed that calculating a numerically higher loss value results in higher gradients which result in a rapid update of parameters to better achieve better reconstruction. I would appreciate any clarification.
In addition to this, when I’m working with [0,255] images I tend to use the default 1e-3 as my learning rate. Based off of this, should I be using an even smaller learning rate when working with [0,1] images? Something like 1e-5 or 1e-4? Any advice would be very much
Thank you so much for your answer! This is definitely informative. I just wanted to add, when you say you “automatically reduce the learning rate 10x when learning stagnates”, do you imply that you do this while training with code? Or do you train up to a checkpoint, analyze loss and gradient graphs and train with a lower learning rate from that checkpoint onward?
No, there is an automatic LR scheduler that does this in torch, search for “pytorch learning rate scheduler”, to do it, it actually computes the Loss in the test set then changes the learning rate ( not fully sure, but I think it was like that.)