Slow torch::autograd::CopyBackwards for 5D tensor (3D images)

czotti · December 7, 2018, 10:50pm

Hi,

Recently, I have started to work on larger images (brain), and I found a big slow down going from size (1, 2, 224, 224, 224) to (1, 2, 256, 256, 256) I was expecting something slower but not by this amount:

(1, 2, 224, 224, 224): torch::autograd::CopyBackwards 402133.723us 0.000us 2 804267.446us 0.000us
(1, 2, 256, 256, 256): torch::autograd::CopyBackwards 10681252.066us 0.000us 2 21362504.132us 0.000us

Here is the code with the output of the torch.autogred.profiler.profile().

I tested it on pytorch 0.4.1 and 1.0.0 and I have the same behavior.
I don’t know what I’m doing wrong with pytorch, someone have any idea?

Thanks

albanD · December 8, 2018, 9:50am

Hi,

I think it could be that you’re missing the use_cuda flag from the profiler doc.
This would mean that for all cuda ops, you only measure the time to launch the kernel but not for it to run. And operations that create synchronization (like copy to the cpu), will have a large runtime just because they wait on the rest to actually run on the GPU.

czotti · December 8, 2018, 2:21pm

Thank a lot I have missed this flag!
I added it and I get the CUDA processing time (updated gist) for my code.

Still I see a large difference in processing time for such a small increase of input tensor:

(256, 256, 256)
- cudnn_convolution_backward 108.772us 5573916.893us 4 435.088us 22295667.572us
- CudnnConvolutionBackward 121.100us 5573919.838us 4 484.401us 22295679.352us
(224, 224, 224)
- cudnn_convolution_backward 125.165us 173178.101us 4 500.661us 692712.402us
- CudnnConvolutionBackward 138.998us 173181.194us 4 555.993us 692724.777us

Is this an expected behavior going from 173ms to 5.6s for each of these operation ?

albanD · December 8, 2018, 2:24pm

I would say this is most likely because cudnn’s default algorithm behavior change when you get something >255.
You can use torch.backends.cudnn.benchmark=True so that cudnn will pick the fastest algorithm depending on your input/hardware. That should smooth things down

czotti · December 8, 2018, 2:56pm

I already use this flag because without the script crash.
Thanks for all the informations, I’ll try to keep the size bellow this treshold.