Hi, for a particular reason, I need to hand-derive a derivative for a GELU and it has to match the GELU implementation in Pytorch, but I tried finding the exact GELU formula and I’m a little confused. Could someone help simplify it for me by confirming whether the following is exactly equivalent to torch.nn.GELU:
Hmm, but how does Pytorch implement that integral? Is there no simple function equivalent to Pytorch’s implementation you could provide that I could enter into the following website:
erf() is a so-called special function. It is well studied, well
understood, and (reasonably) easy to calculate with modern
numerical analysis techniques.
But there is no “simple function equivalent” to it (other than erf()
or things like the integral used to define it). So erf() is the best
you’ve got.
If you want to “compute a derivative by hand,” that’s easy, because
the derivative of erf() is an elementary function, namely the
normal distribution.
My pocket calculator doesn’t know how anything about erf(), so
there’s no practical way for me to calculate erf() with my calculator.
But python does know about erf(), so math.erf (1.234) works
just fine in python.
If the calculus tool you linked to knows about erf(), then you should
be good. If it doesn’t, you’ll have to “compute a derivative by hand”
(or switch to a better calculus tool).
I’m confused, if there is no function equivalent, then how does Pytorch compute it? From reading about GeLU, it seems like they used an approximation of the erf function in the paper using tanh.