ResNet finetuning - choice of optimizer

Hi,

is there any particular reason to prefer SGD when finetuning a pretrained ResNet? It seems to be the common choice, is there a heuristic or mathematical reason for that or has it simply been found empirically to perform best?