Implementation of Poincaré Embeddings

while trying to implement the following paper in PyTorch:, I encountered some issues with my gradients. I tried checking the sanity of the values and everything seems to be correct but still get some NaN values for the gradients.

Here is the repo implementing the code in a Jupyter Notebook:

Also, since this is the first time I try to implement some paper in PyTorch primitives, I would be glad if someone more experienced could review my code and maybe point some best practices and/or optimization that I could apply.

Thank you for your precious help and guidance for the code review!

1 Like

Nice code! I got it to work after a couple fixes:

instead of:

params = params.div(norm-EPS)


params = params.div(norm) - EPS

θ/∥θ∥−ε is what it says in the paper – EPS needs to be outside the denominator, otherwise you’re increasing the big embeddings instead of decreasing them.

instead of:

gamma = gamma.clamp(min=1)


gamma = gamma.clamp(min=1+EPS)

This ensures that there is a minimal distance between equivalent embeddings (same word). Even though this seems counter-intuitive, doing this avoids getting a zero-divisor in the derivative (see the definition of γ in the partial derivative of Poincaré distance in the paper).

1 Like

have tried this paper(… something similar on Poincaré embedding, and I can’t replicate the result as good as reported

Amazing, I corrected as you mentionned and I have pushed the new version on github. We might need to work on the speed and complexity of the implementation since this is very slow compared to the C++ implementation.

Thank you for your help, if you have any inputs to make it faster, let me know.

What C++ implementation are you referring to?

This one:

A reference PyTorch implementation is now also available here: