Save and Load cudnn.benchmark results to resume with speed increase

jakethecake · November 9, 2023, 11:04am

Hello!

As i understand it “torch.backends.cudnn.benchmark” benchmarks multiple convolution algorithms during the first epoch to then uses the fastest during subsequent epochs. If i checkpoint my model and then resume it, cudnn has to rerun the benchmark again for the first epoch of the resumed run. Is there away to save the results from the benchmark in epoch 1 and then load that result when resuming training?

Theoretically since it’s continuing on the same model, shouldn’t the same convolution algorithms be the most optimal when resuming training?

ptrblck · November 9, 2023, 1:30pm

No, it’s currently not possible.

jakethecake · November 9, 2023, 10:57pm

Thank you for the response! I had a feeling that this was the case.