There’s a similar post here Strange behavior with SGD momentum training which also shows a saw toothed loss. A suggestion by smth is to do sampling with replacement.
There’s a similar post here Strange behavior with SGD momentum training which also shows a saw toothed loss. A suggestion by smth is to do sampling with replacement.