GELU Pytorch formula?

Sam_Lerman · May 5, 2020, 7:14pm

Hi, for a particular reason, I need to hand-derive a derivative for a GELU and it has to match the GELU implementation in Pytorch, but I tried finding the exact GELU formula and I’m a little confused. Could someone help simplify it for me by confirming whether the following is exactly equivalent to torch.nn.GELU:

0.5 * x * (1 + torch.tanh(np.math.sqrt(2 / np.math.pi) * (x + 0.044715 * torch.pow(x, 3)))

If not, could you provide the correct explicit formula? (Not for the derivative, just for the raw GELU)

ptrblck · May 6, 2020, 6:45am

Based on this test, gelu should correspond to:

const auto y_exp = x * 0.5 * (1.0 + torch::erf(x / std::sqrt(2.0)));

torch::erf is probably used from the std lib or the CUDA implementation.

Sam_Lerman · May 6, 2020, 1:03pm

I saw that elsewhere, but it doesn’t help me compute a derivative by hand because I don’t know the exact function that torch::erf takes.

KFrank · May 6, 2020, 3:52pm

Hi Sam!

The function can be found in the documentation for torch.erf().

More detail can be found in the wikipedia entry Error function.

As you can see, its derivative is just the probability density function
for the normal distribution.

Best.

K. Frank

Sam_Lerman · May 6, 2020, 5:53pm

Hmm, but how does Pytorch implement that integral? Is there no simple function equivalent to Pytorch’s implementation you could provide that I could enter into the following website:

http://www.matrixcalculus.org

I need to derive an MLP that uses GeLU activations and it would be convenient if I could just enter the full formula into that tool above.

KFrank · May 6, 2020, 9:45pm

Hi Sam!

erf() is a so-called special function. It is well studied, well
understood, and (reasonably) easy to calculate with modern
numerical analysis techniques.

But there is no “simple function equivalent” to it (other than erf()
or things like the integral used to define it). So erf() is the best
you’ve got.

If you want to “compute a derivative by hand,” that’s easy, because
the derivative of erf() is an elementary function, namely the
normal distribution.

My pocket calculator doesn’t know how anything about erf(), so
there’s no practical way for me to calculate erf() with my calculator.
But python does know about erf(), so math.erf (1.234) works
just fine in python.

If the calculus tool you linked to knows about erf(), then you should
be good. If it doesn’t, you’ll have to “compute a derivative by hand”
(or switch to a better calculus tool).

Best.

K. Frank

Sam_Lerman · May 6, 2020, 10:11pm

I’m confused, if there is no function equivalent, then how does Pytorch compute it? From reading about GeLU, it seems like they used an approximation of the erf function in the paper using tanh.