I have another question, so if we use the pre-trained model for the new dataset, for example from ResNet50 trained by ImageNet dataset, so the momentum (in both cases of SGD and Adam) and the num_iterations (in case of Adam) are both a large number since the first epoch of new dataset, is it correct?