Implementation of Poincaré Embeddings

Hi,
while trying to implement the following paper in PyTorch: https://arxiv.org/abs/1705.08039, I encountered some issues with my gradients. I tried checking the sanity of the values and everything seems to be correct but still get some NaN values for the gradients.

Here is the repo implementing the code in a Jupyter Notebook:

Also, since this is the first time I try to implement some paper in PyTorch primitives, I would be glad if someone more experienced could review my code and maybe point some best practices and/or optimization that I could apply.

Thank you for your precious help and guidance for the code review!

1 Like

Nice code! I got it to work after a couple fixes:

instead of:

params = params.div(norm-EPS)

do:

params = params.div(norm) - EPS

θ/∥θ∥−ε is what it says in the paper – EPS needs to be outside the denominator, otherwise you’re increasing the big embeddings instead of decreasing them.

instead of:

gamma = gamma.clamp(min=1)

do:

gamma = gamma.clamp(min=1+EPS)

This ensures that there is a minimal distance between equivalent embeddings (same word). Even though this seems counter-intuitive, doing this avoids getting a zero-divisor in the derivative (see the definition of γ in the partial derivative of Poincaré distance in the paper).

1 Like

have tried this paper(https://arxiv.org/abs/1707.07847)… something similar on Poincaré embedding, and I can’t replicate the result as good as reported

Amazing, I corrected as you mentionned and I have pushed the new version on github. We might need to work on the speed and complexity of the implementation since this is very slow compared to the C++ implementation.

Thank you for your help, if you have any inputs to make it faster, let me know.

What C++ implementation are you referring to?

This one: https://github.com/TatsuyaShirakawa/poincare-embedding

A reference PyTorch implementation is now also available here: https://github.com/facebookresearch/poincare-embeddings

2 Likes