The benchmark will happen every time you call a convolution with inputs of different size.
So it never really “finishes” as if you try to do a convolution with new shape, it will benchmark that new shape to find the best algorithm.
Note that for a given shape, the benchmark will happen during the first call and when the call to conv returns (and you sync the device with torch.cuda.synchronize()), there won’t be any benchmark for that shape anymore.