That looks very normal for a simple setup. You could apply moving average on your values, it might be easier to tell what is happening. Also look up tensorboard for logging training scalars such as loss over time (it can also show you many different things).
I think you do not use scheduling for the learning rate, correct? That might improve your results a lot.
1 Like
Thanks for your reply. I did not use scheduling for the learning rate. How to apply moving average?