I have a question about RNN-T transducer loss

When using the RNN-T transducer loss, there are ‘sum’ and ‘average’ in the reduction method, and I know that it means whether to update with the sum of the loss in the batch or with the average value, respectively.
But I wonder why this is divided.
In the case of CTC loss, the inf value is sometimes output, so zero infinity and the sum method may be helpful, but in the case of transducer loss, the inf value is not output. What are the differences in learning methods between the two methods?
Anyway, the loss is -log(x)… but if I divide it by a constant, it doesn’t have any effect on the gradient.