I am somewhat skeptical of the idea. I certainly wouldn’t do it without
also building an equivalent model without the logit transformation and
checking that the logit transformation improves things.
Some comments:
Your logit function looks right.
I think you have to wrap your lambda in a pytorch tr.Lambda
transformation.
The logit function maps a probability in [0.0, 1.0] to [-inf, inf].
There is no reason to think that your data reside in [0.0, 1.0],
and, even if they did, the tr.Normalize could well push them outside
of [0.0, 1.0]. (The mean of 0.5 and standard deviation of 0.5 don’t
mean that the result will lie strictly withing [0.0, 1.0]; deviations
of one-and-a-half and two standard deviations are quite likely, so after tr.Normalize, you are likely to have values outside of [0.0, 1.0].)
If your raw input data were naturally probabilities I think it would make
sense to apply the logit transformation. But otherwise my intuition is
that doing so is not likely to be helpful. In any event, the raw data, or
a transformed version of it, had better lie within [0.0, 1.0]; otherwise
the logit transformation will mix things up (it’s not monotone), and will
risk (unnecessary) divergences.
thanks for the reply. Let me give a bit of context which I think might help clarify things.
Basically I’m trying what ppl call an “energy based model” (disclaimer: I hate this name, it’s just a probabilistic model), on fashion mnist. Basically since fmnist has only one channel because I didn’t wanted to change the underlying model the author was using, I just used tr.Grayscale(num_output_channels=3) to transform it having 3 channels.
Training the model I can only get as high as 83% in terms of accuracy. So, I reached to the original authors for any tips and one of the recommended tips was the logit transform.
Now that I’ve tried it I can tell with certainty that it gave worse result than without it.