How to convert PyTorch code into equivalent Cuda code?

Ajinkya.Bankar · March 26, 2021, 12:44am

I need Cuda codes of the pre-trained AlexNet, GoogleNet, and VGG. Is there any utility that converts PyTorch code into equivalent Cuda code? Kindly help. Thanks.

ptrblck · March 26, 2021, 9:18am

PyTorch code cannot be directly translated to CUDA code, as the CUDA kernels are only a part of the overall model.
What is your use case?

Ajinkya.Bankar · March 26, 2021, 12:35pm

Thank you for your response. I want to use AlexNet and VGG inferencing on the GPGPU Simulator. But the simulator does not support PyTorch directly. So I want to convert pre-trained models from the PyTorch into Cuda codes and then use them on the simulator for inferencing. Is there any other possible way? Kindly help.

ptrblck · March 27, 2021, 12:47am

I’m not familiar with the GPGPU Simulator, but from a quick look at it, I’m not sure, if this library can be used to simulate a complete application or if it’s rather used for specific kernels.

EssamWisam1 · May 22, 2025, 4:13pm

@ptrblck could you elaborate further. Like why is it possible to get Triton code from torch.compile but not CUDA code?

ptrblck · May 22, 2025, 11:05pm

You can read the native CUDA kernels in the code base, but note my questions:

Exporting the host as well as device code to CUDA won’t be possible as you need the execute the host code by definition.
If you are only asking about native CUDA kernels, you can find them in the repository.
CUDA Math libs (e.g. cuBLAS and cuDNN) are closed-source and you won’t be able to read the code.

EssamWisam1 · May 27, 2025, 11:37am

Thank you so much.

I am pondering why is it not possible to convert a PyTorch nn.Module into its equivalent CUDA code since my understanding is that this (or something similar) happens under the hood to run on my Nvidia GPU. Or otherwise why torch.compile does not have this option.

as the CUDA kernels are only a part of the overall model.
Could you clarify further? I thought:

import torch

class MyModel(torch.nn.Module):
    def forward(self, x):
        return torch.relu(x @ x.T + x)

would purely equivalent to a pipeline of one or more kernels written in CUDA.