Performance (Time) difference between different Pytorch Versions

Hi,
I am porting some of my old code (Pytorch 0.4.0) to latest pytorch version, although there is hardly any syntactic difference. But, i have observed that some architectures like an autoencoder trains much slower on the latest pytorch, while using a classification network like taking off-the-shelf resnet from torchvision models takes the same amount of time for training.

I have tried using the torch.backends.cudnn.benchmark=True solution, but that doesn’t help.
I have also tried reinstalling conda and the environments, but it makes no difference.
My machine’s configuration is Ubuntu 18.04, Nvidia GTX 1080 Ti

For a sample run, I took this simple convolutional autoencoder from [here] and trained it on mnist.(https://github.com/L1aoXingyu/pytorch-beginner/tree/master/08-AutoEncoder)

With pytorch 0.4.0, torchvision 0.2.1, cudatoolkit 9.0, cudnn 7.6.0, each epoch takes ~4 seconds.
With pytorch 1.1.0, torchvision 0.3.0 and cudatoolkit 10.0, cudnn 7.5.1, each epoch takes ~7 seconds.

Let me know if someone has observed a similar issue and any help in rectifying is appreciated.

I don’t know if that’s the issue, but your 0.4.0 version comes with cudnn 7.6.0, which should be faster than the cudnn version of your pytorch 1.1.0 according to this link. If you use GPUs, the operations are often dispatched to cudnn ops.

Thats probably the issue I guess.
But thats what is installed with the respective versions of pytorch and cuda. It seems like I will have to install pytorch from source.