Why is pytorch 1.2 50% slower than pytorch 1.0?

I’m at a loss. I just upgraded from pytorch 1.0 to pytorch 1.2 and see huge slowdowns in training (50%-70%). What could be causing this discrepancy? I’m building my docker container off of official nvidia/cuda:10.0 image and haven’t changed anything except upgrading pytorch.

The command for upgrading was:

conda install -c pytorch pytorch=1.2.0=py3.7_cuda10.0.130_cudnn7.6.2_0 torchvision

What am I missing?

Do you have a code snippet so that we can reproduce this issue?

Hmm… not easily, unfortunately. :frowning: I’m using a large custom dataset for segmentation with SGD and BCE + Dice so nothing too crazy there. The model I’m using is here I’ll try another model just to see if that’s where the issue lies.

Confirming that that particular model is training much slower in pytorch 1.2. Any ideas what specifically in it could be causing the slowdown? At first glance there doesn’t seem to be anything incredibly different about it.

Could you try to profile specific parts of the model and try to isolate a submodule?
I assume we could use some random dummy data for profiling, of the slowdown is created in the model?

Ok I think I narrowed it down to the basic ResNeXt101_64x4d class since using a different backbone (e.g. densenet) does not produce the slowdown. I’m guessing the slowdown is in the features. Still bewildered as to why that would cause such a drastic change in performance though.