Internally empty_cache is called if an out of memory error was detected as an attempt to recover from it. Also, if you are using cudnn with benchmark=True, different algorithms will be profiled in the first iteration, which use a different amount of memory for their workspaces. To recover this memory and allow other processes to use it, empty_cache will also be used.
cudnnFind will use different kernel implementations for the current workload, e.g. the current conv layer. You can imagine it as iterating over a conv implementation using matrix multiplications, an FFT, Winograd, etc. Each of these different algorithms (for the same convolution with the given input shapes, padding, dilation, etc.) will be executed and the time will be captured. Once this is done, the fastest would be selected in the best case.