With torch.backends.cudnn.benchmark = True
added, here are the results!!!
vgg16
passes:
- 1080 Ti: 41.4ms
- Titan V: 31.3ms
resnet152
passes:
- 1080 Ti: 60.4ms
- Titan V: 49.0ms
densenet121
passes:
- 1080 Ti: 29.9ms
- Titan V: 26.2ms
Looks like adding this magic line works!!!
Thanks @Soumith_Chintala!
As a user, I was expecting that simply popping in a Titan V would work a little faster than a 1080 Ti without modifying my code (I posted a toy example here, but my VGG16-based MVCNN that I used for object classification ran much slower as well, so that’s why I posted this.)