Is F.gumbel_softmax correct?

Pasquale_Minervini · July 6, 2022, 5:19pm

In pytorch/functional.py at b4ed13ea0ff091328c6d0dfdf5d751d4280fb67f · pytorch/pytorch · GitHub you can see that, in F.gumbel_softmax, samples from Gumbel(0, 1) are drawn in the following way:

    gumbels = (
        -torch.empty_like(logits, memory_format=torch.legacy_contiguous_format).exponential_().log()
    )  # ~Gumbel(0,1)

Is this correct? torch.empty_like returns an uninitialised tensor, which is not initialized in any way here.

ptrblck · July 7, 2022, 4:51am

Yes, it’s correct as you are right that torch.empty_like will create a tensor with uninitialized memory, but the following .exponential_() operation would fill the values from an exponential distribution as described in the docs.

Pasquale_Minervini · July 7, 2022, 6:49am

I see, thank you! I thought .exponential_() was f(x) = e^x!

ptrblck · July 7, 2022, 7:04am

Yeah, I thought so but this is done by tensor.exp().

Asthestarsfalll · September 6, 2022, 10:15am

Hi, I find that in many other implementations of gumbel softmax, they use the formule $G_i = -log(-log(U_i)), U_i \in U(0, 1)$ to generate gumbel noise. Why torch implementation samples from exponential distribution?

ptrblck · September 6, 2022, 4:02pm

Both approaches should be valid as described in THE CONCRETE DISTRIBUTION: A CONTINUOUS RELAXATION OF DISCRETE RANDOM VARIABLES - https://arxiv.org/pdf/1611.00712.pdf:

The apparently arbitrary choice of noise gives the trick its name, as − log(− log U ) has a Gumbel
distribution. This distribution features in extreme value theory (Gumbel, 1954) where it plays a
central role similar to the Normal distribution: the Gumbel distribution is stable under max opera-
tions, and for some distributions, the order statistics (suitably normalized) of i.i.d. draws approach
the Gumbel in distribution. The Gumbel can also be recognized as a − log-transformed exponen-
tial random variable. So, the correctness of (9) also reduces to a well known result regarding the
argmin of exponential random variables. See (Hazan et al., 2016) for a collection of related work,
and particularly the chapter (Maddison, 2016) for a proof and generalization of this trick.

Asthestarsfalll · September 7, 2022, 8:12am

Thank you very much!