Hi, I’m experiencing a huge speed drop when running the exact same code on pascal cards(like 1080, or titanXP, or P100) and more latest cards(titanV or 2080tis), both running on pytorch1.5, cuda10.2 and cudnn7.6.5 shipped with pytorch.
After looking into the code, I found out the problem is about the dilated conv in my network.
I’m running a task similar to image segmentation and I’m using a resnet2d as my network. Here for comparison, I turned off the shuffle on my dataloaders, so input size should be the exact same in all my benchmarks.
When I set all dilations to 1 in all my network, the speed is quite similar on different cards, around 30~35 seconds for 100 iterations on both pascals cards and latest cards.
However, when I turn dilation on, speed would drop significantly on pascal cards, around 140 seconds for 100 iterations, while on titanV, it’s still around 40 seconds for 100 iterations.
In short, speed is similar when I turn dilation off, and speed on pascal cards are much slower (3 to 4 times )when I turn dilation on.
I found a similar topic here, discussing about the backend cudnn here.
However, as my input size would change, setting torch.backends.cudnn.benchmark to True just slow my speed down. Any help here?