Trace cudnn.benchmark algo selection

I’m writing a generic converter of Caffe-trained models.

The model I’m trying it on with is a VGG-16 model with 3x3 convs, using ROI pooling. As a consequence, it
uses batch_size == 1, then after ROI pooling, the batch size becomes around 2K, and the Linear layers have to do a lot of work.

I am observing a huge perf degradation (around 2x) in model evaluation when I set cudnn.benchmark = True. The difference gets somewhat smaller with time.

Is there a way to find the actually selected algos?