The trend of Loss changes strange when training

I am running a project about visual speech recognition task, and I found a strange phenomenon.
In the first sevaral epochs, the loss can decrease at a normal speed. I mean, in a epoch, the loss would decrease with the increase of number of iterations. While in the latter epochs, the loss almost unchanged in the whole epoch, but would decrease at the start of next epoch.
Sometimes I interruptted the running code after a epoch finishing and restart from the trained weights. The loss would decrease, too.
I get very confused about this phenomenon. Can someone help me solve this problem. Thanks!