How is permutation implemented in PyTorch cuda

Hi, I am interested to find out how PyTorch cuda implement permutations. However, I cannot find it in the repo.

I assumed that there should be a GPU kernel function for permutation in PyTorch/aten/src/ATen/native/cuda/, but I didn’t find it in either TensorTransformation.cu or TensorFactories.cu.

Could someone point me to the implementation? Thanks.

Fei

Take a look here:

Hi LeviViana,

Thanks for getting to me so quickly. However, I think maybe I didn’t express my question clearly. I was hoping to find the implementation for permutation, as in:

a = a.permute(2, 0, 1),

where a is a 3D PyTorch tensor, and the code permute a’s dimensions such that the inner most dimension (2) is changed to the outer most (0).

the code is equivalent to code like this:

a = a.transpose(1, 2).transpose(0, 1).contiguous()

The code you pointed out seems to be for random permutations, not permutations of dimensions, if I understood it correctly.

Fei

1 Like

permutation calls to transpose

Hi JuanFMontesinos,

Thanks for making that clear.

But how does PyTorch implement transpose on GPU? :slight_smile:

I didn’t find the source code in
pytorch/aten/src/ATen/native/cuda/
pytorch/aten/src/ATen/native/cudnn/
It will be great if you can point me to the correct file.

Fei

torch.permute() is carried out just by changing the strides of the dimensions (similar to numpy).

Just to quote from https://ipython-books.github.io/45-understanding-the-internals-of-numpy-to-avoid-unnecessary-array-copying/ :

When reshaping an array, NumPy avoids copies when possible by modifying the strides attribute. For example, when transposing a matrix, the order of strides is reversed, but the underlying data remains identical

Here you can look at the code:

1 Like

Hi InnovArul,

Thanks for addressing it. That is indeed interesting insights.

However, what if the operation following permutations requires the data layout to be packed in certain way. For instance, in the model DeepSpeech, permutation is used right before RNN layers. If the data before permutation is column-major, and RNN input has to be column-major (as required by cudnnRNNForwardTraining), then we cannot just change the strides for permutation (data is no longer column-major if the stride of the last dimension is not 1).

Is this the case where some sort of data copying has to happen?

P.S. Proof that cudnnRNNForwardTraining requires the input to be column-major, from:

https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#cudnnRNNForwardTraining

Input vectors are expected to be arranged in the column-major order so strides in xDesc should be set as follows: strideA[0]=inputSize, strideA[1]=1, strideA[2]=1.

========= update ================
I realized that in cases like this, a .contiguous() function is used to copy the data into column-major form. I am looking at how PyTorch implement copy() at this moment. No outstanding questions for now :slight_smile:

Thanks,

Fei