Hi, I am interested to find out how PyTorch cuda implement permutations. However, I cannot find it in the repo.

I assumed that there should be a GPU kernel function for permutation in PyTorch/aten/src/ATen/native/cuda/, but I didn’t find it in either TensorTransformation.cu or TensorFactories.cu.

Could someone point me to the implementation? Thanks.

Thanks for getting to me so quickly. However, I think maybe I didn’t express my question clearly. I was hoping to find the implementation for permutation, as in:

a = a.permute(2, 0, 1),

where a is a 3D PyTorch tensor, and the code permute a’s dimensions such that the inner most dimension (2) is changed to the outer most (0).

the code is equivalent to code like this:

a = a.transpose(1, 2).transpose(0, 1).contiguous()

The code you pointed out seems to be for random permutations, not permutations of dimensions, if I understood it correctly.

I didn’t find the source code in
pytorch/aten/src/ATen/native/cuda/
pytorch/aten/src/ATen/native/cudnn/
It will be great if you can point me to the correct file.

When reshaping an array, NumPy avoids copies when possible by modifying the strides attribute. For example, when transposing a matrix, the order of strides is reversed, but the underlying data remains identical

Thanks for addressing it. That is indeed interesting insights.

However, what if the operation following permutations requires the data layout to be packed in certain way. For instance, in the model DeepSpeech, permutation is used right before RNN layers. If the data before permutation is column-major, and RNN input has to be column-major (as required by cudnnRNNForwardTraining), then we cannot just change the strides for permutation (data is no longer column-major if the stride of the last dimension is not 1).

Is this the case where some sort of data copying has to happen?

P.S. Proof that cudnnRNNForwardTraining requires the input to be column-major, from:

Input vectors are expected to be arranged in the column-major order so strides in xDesc should be set as follows: strideA[0]=inputSize, strideA[1]=1, strideA[2]=1.

========= update ================
I realized that in cases like this, a .contiguous() function is used to copy the data into column-major form. I am looking at how PyTorch implement copy() at this moment. No outstanding questions for now