Logit transform of dataset

Someone suggested me to do a logit transform of the dataset before passing it to the model. Has anyone heard of it before?

Is the following correct in the sense of performing logit transform to the dataset? Or should I do it before any other transformation occurs.

 transform_train = tr.Compose(
            [tr.Resize((32, 32)),  # <-- should I put logit transform here?
             tr.Pad(4, padding_mode="reflect"),
             tr.RandomCrop(im_sz),
             tr.RandomHorizontalFlip(),
             tr.ToTensor(),
             tr.Normalize((.5, .5, .5), (.5, .5, .5)),
             lambda x: x + args.sigma * torch.randn_like(x),
             lambda x: -torch.log(1/x - 1)]  # <-- logit transform
        )

Hello Kirk!

I am somewhat skeptical of the idea. I certainly wouldn’t do it without
also building an equivalent model without the logit transformation and
checking that the logit transformation improves things.

Some comments:

Your logit function looks right.

I think you have to wrap your lambda in a pytorch tr.Lambda
transformation.

The logit function maps a probability in [0.0, 1.0] to [-inf, inf].
There is no reason to think that your data reside in [0.0, 1.0],
and, even if they did, the tr.Normalize could well push them outside
of [0.0, 1.0]. (The mean of 0.5 and standard deviation of 0.5 don’t
mean that the result will lie strictly withing [0.0, 1.0]; deviations
of one-and-a-half and two standard deviations are quite likely, so after
tr.Normalize, you are likely to have values outside of [0.0, 1.0].)

If your raw input data were naturally probabilities I think it would make
sense to apply the logit transformation. But otherwise my intuition is
that doing so is not likely to be helpful. In any event, the raw data, or
a transformed version of it, had better lie within [0.0, 1.0]; otherwise
the logit transformation will mix things up (it’s not monotone), and will
risk (unnecessary) divergences.

Best.

K. Frank

Hi K. Frank,

thanks for the reply. Let me give a bit of context which I think might help clarify things.
Basically I’m trying what ppl call an “energy based model” (disclaimer: I hate this name, it’s just a probabilistic model), on fashion mnist. Basically since fmnist has only one channel because I didn’t wanted to change the underlying model the author was using, I just used tr.Grayscale(num_output_channels=3) to transform it having 3 channels.

Training the model I can only get as high as 83% in terms of accuracy. So, I reached to the original authors for any tips and one of the recommended tips was the logit transform.

Now that I’ve tried it I can tell with certainty that it gave worse result than without it.