Pytorch 0.2, nan for embedding

HI ALL, I am trying to implement the BPR (https://arxiv.org/pdf/1205.2618), and I will got nan values for embedding, I am wondering that if it bugs in my code or the reason in "gradient of torch.norm at 0 (in version 0.2 and before) is NaN.” (Embeddings become NaN).
my code is:

class BPR_MF(nn.Module):
    def __init__(self, user_num, item_num, embedding_dim):
        super(BPR_MF, self).__init__()
        self.user_embedding = nn.Embedding(user_num, embedding_dim)
        self.item_embedding = nn.Embedding(item_num, embedding_dim)
        self.logSigmoid = nn.LogSigmoid()
        self.item_num = item_num
        self.user_num = user_num
    def forward(self, input_triple):
        u,i,j = input_triple
        user_embed = self.user_embedding(u)
        item_i_embed = self.item_embedding(i)
        item_j_embed = self.item_embedding(j)
        score = (user_embed * (item_i_embed - item_j_embed)).sum()
        log_prob = self.logSigmoid(score)
        reg = (user_embed*user_embed).sum() + (item_i_embed*item_i_embed).sum() + (item_j_embed*item_j_embed).sum()
        return log_prob, reg

and the loss function is :

    loss = -1.0 * (log_prob) + reg * regularize // target is try to maximize the log_prob, so multiply by -1.

Thanks in advance

Not sure this will help or not, try clip_grad_norm.

Thank you, problem solved by updating to 0.3.