Yes, it’s correct as you are right that torch.empty_like will create a tensor with uninitialized memory, but the following .exponential_() operation would fill the values from an exponential distribution as described in the docs.

Hi, I find that in many other implementations of gumbel softmax, they use the formule $G_i = -log(-log(U_i)), U_i \in U(0, 1)$ to generate gumbel noise. Why torch implementation samples from exponential distribution?

The apparently arbitrary choice of noise gives the trick its name, as − log(− log U ) has a Gumbel
distribution. This distribution features in extreme value theory (Gumbel, 1954) where it plays a
central role similar to the Normal distribution: the Gumbel distribution is stable under max opera-
tions, and for some distributions, the order statistics (suitably normalized) of i.i.d. draws approach
the Gumbel in distribution. The Gumbel can also be recognized as a − log-transformed exponen-
tial random variable. So, the correctness of (9) also reduces to a well known result regarding the
argmin of exponential random variables. See (Hazan et al., 2016) for a collection of related work,
and particularly the chapter (Maddison, 2016) for a proof and generalization of this trick.