Backward of self-defined loss function is very slow

I define a loss function by myself, but the duration of loss.backward() is very long.
The code is pasted below. The training speed is apprently slower than tensorflow implementation.Is the reason that I define the loss function in not proper method.

        def func(logits, target):
            log_likelihood = -F.log_softmax(logits, dim=1)
            batch = logits.shape[0]
            loss = torch.sum(torch.mul(log_likelihood, target)) / batch
            return loss


This code should be as efficient as it can be.
How do you measure the runtime?

Oh, I was wrong. The loss function is not reason of slow training speed.