I’m writing a generic converter of Caffe-trained models.
The model I’m trying it on with is a VGG-16 model with 3x3 convs, using ROI pooling. As a consequence, it
uses batch_size == 1, then after ROI pooling, the batch size becomes around 2K, and the Linear layers have to do a lot of work.
I am observing a huge perf degradation (around 2x) in model evaluation when I set
cudnn.benchmark = True. The difference gets somewhat smaller with time.
Is there a way to find the actually selected algos?