Trainer Class Predict Func == Decreasing Prediction Time

I didn’t mean to claim synchronizations would speed up your code, but should be used if you want to profile the actual GPU execution.
I.e. you should add a synchronization via torch.cuda.synchronize() before using host timers via e.g. time.perf_counter().

1 Like