I’m training a char RNN with PyTorch. I could see that the loss is erratic. I compared my code with Andrej Karpathy’s famous 100 line gist. He is displaying a smoothened loss rather than the original. I printed out the original loss and found similar results of mine.
- Do we need smoothened loss?
- Why is the loss not reducing?
- Is it related to char RNN?
I request the help from the community.