Accuracy of nn.NLLLoss

I am trying to use nn.NLLLoss, so the code is:

loss = nn.NLLLoss()
loss_by_torch = loss(predictions_logp, actual_tokens)

There is another method to compute it:

loss_by_gather = -torch.mean(torch.gather(predictions_logp, dim=1, index=actual_tokens[:,None]))

And another one by using a function like this:

def compute_NLLLoss(logs, targets):
    out = torch.zeros_like(targets, dtype=torch.float)
    for i in range(len(targets)):
        out[i] = logs[i][targets[i]]
    return -torch.mean(out)

loss_by_func = compute_NLLLoss(predictions_logp, actual_tokens)

So, loss_by gather and loss_by_func have absolutely the same values, while loss_by_torch differs.
In one minibatch the shape of predictions_logp is [87312, 85], and the difference between loss_by_gather (or loss_by_func) and loss_by_torch varies from -0.001 to 0.001.
Does torch implementation use different formulae?