while trying to implement the following paper in PyTorch: https://arxiv.org/abs/1705.08039, I encountered some issues with my gradients. I tried checking the sanity of the values and everything seems to be correct but still get some NaN values for the gradients.
Here is the repo implementing the code in a Jupyter Notebook:
Also, since this is the first time I try to implement some paper in PyTorch primitives, I would be glad if someone more experienced could review my code and maybe point some best practices and/or optimization that I could apply.
Thank you for your precious help and guidance for the code review!
Nice code! I got it to work after a couple fixes:
params = params.div(norm-EPS)
params = params.div(norm) - EPS
θ/∥θ∥−ε is what it says in the paper – EPS needs to be outside the denominator, otherwise you’re increasing the big embeddings instead of decreasing them.
gamma = gamma.clamp(min=1)
gamma = gamma.clamp(min=1+EPS)
This ensures that there is a minimal distance between equivalent embeddings (same word). Even though this seems counter-intuitive, doing this avoids getting a zero-divisor in the derivative (see the definition of γ in the partial derivative of Poincaré distance in the paper).
have tried this paper(https://arxiv.org/abs/1707.07847)… something similar on Poincaré embedding, and I can’t replicate the result as good as reported
Amazing, I corrected as you mentionned and I have pushed the new version on github. We might need to work on the speed and complexity of the implementation since this is very slow compared to the C++ implementation.
Thank you for your help, if you have any inputs to make it faster, let me know.
What C++ implementation are you referring to?
A reference PyTorch implementation is now also available here: https://github.com/facebookresearch/poincare-embeddings