Great! But if you don’t use SGD without momentum as optimizer, you would also have to reinitialize your optimizer’s state_dict
for comparable results.
Great! But if you don’t use SGD without momentum as optimizer, you would also have to reinitialize your optimizer’s state_dict
for comparable results.