I am trying to use a beta-binomial loss, and I am running into all sorts of pain related to the computation of the gradient on the gamma function (a separate discussion has happened elsewhere on this forum).

Thinking about it a little, it occurs to me that since the Beta function is quite smooth, it should be possible to compute a gradient for a pre-specified set of data points, and use that to interpolate the gradient values at runtime. This is especially true since for large values of the parameters (a, b) the Beta distribution approximates well to a Gaussian.

So, my question then is, is there anything wrong with my understanding here? I am asking since I am relatively new to PyTorch and deep learning. I assume a lookup table strategy would be easy to implement using tensor operations in Python directly. And I am also wondering if nn.LookupTable couldn’t be repurposed to this end. Just wanted to check before going down this path, thanks!