ResNet finetuning - choice of optimizer

dronline · February 14, 2018, 11:28am

Hi,

is there any particular reason to prefer SGD when finetuning a pretrained ResNet? It seems to be the common choice, is there a heuristic or mathematical reason for that or has it simply been found empirically to perform best?