Why the entropy term of KLDivLoss should be calculated? isn't it meaningless to compute gradients?

Since the first term of KLDivLoss^2 (the entropy of ground truth; y_true * log(y_true)) is constant, it is negligible when calculating gradients.
I also checked in my notebook^3 if the calculated gradient between KLDivLoss and CrossEntropyLoss is equal(Figure 1).
So, what is the use-case to calculate the entropy term of KLDivLoss? Isn’t it meaningless when computing gradient?

Figure 1: Comparison between gradient of CELoss and KLDivLoss.
kl_div

My second question is, if they are equivalent, why these two functions are implemented separately. Is it simple to merge into a single function?

So, what is the use-case to calculate the entropy term of KLDivLoss?

I come up with a use-case. In contrastive learning, where the ground truth is also a prediction of a model, and back-propagating ground truth label.