Thank. I got it.
It seems I got big loss because probability of each time steps is relatively small even though they were biggest in the time steps. I don’t know which works better in real world use case between small loss ( predict something extremely confidently ) and relatively big loss ( predict same thing but not for so sure ).