How do I know randperm is performed on GPU

You can specify the desired device in torch.ramdperm and check the used CUDA kernel e.g. via:

nsys nvprof python -c "import torch; torch.randperm(10, device='cuda')"

Output:

CUDA Kernel Statistics:

 Time(%)  Total Time (ns)  Instances  Average  Minimum  Maximum                                                  Name                                                
 -------  ---------------  ---------  -------  -------  -------  ----------------------------------------------------------------------------------------------------
    42.1            7,425          1  7,425.0    7,425    7,425  void at::cuda::detail::cub::DeviceRadixSortSingleTileKernel<at::cuda::detail::cub::DeviceRadixSortP…
    16.3            2,880          1  2,880.0    2,880    2,880  void at::native::(anonymous namespace)::distribution_elementwise_grid_stride_kernel<unsigned int, 4…
    14.3            2,528          1  2,528.0    2,528    2,528  void at::native::vectorized_elementwise_kernel<4, at::native::FillFunctor<long>, at::detail::Array<…
    13.8            2,432          1  2,432.0    2,432    2,432  void (anonymous namespace)::elementwise_kernel_with_index<int, at::native::arange_cuda_out(c10::Sca…
    13.4            2,368          1  2,368.0    2,368    2,368  void (anonymous namespace)::randperm_handle_duplicate_keys_kernel<int, at::native::(anonymous names…
1 Like