Unexpected genericTranspose_kernel when enabling channel_last in Pytorch with amp

Hi experts:

I enable according to this tutorial (beta) Channels Last Memory Format in PyTorch — PyTorch Tutorials 2.2.0+cu121 documentation and amp in the same time. When I profiling it with nsight system, I found a lot of unexpected genericTranspose_kernel when the input shape is [3, 512, 960]

And the genericTranspose_kernel are around cutlass kernels
And I want to know why the kernels show up so I change the input shape. When the input shape is [32, 128, 240], the genericTranspose_kernel is gone

Any insights on that? Thank you!

The genericTranspose_kernel are around cutlass kernels

cuDNN might not have native channels-last kernels for the input shapes provided by you and would transpose the input instead of failing.

1 Like