Is there any way to add a customized CUDA kernel generator based on the output from Torch.FX.graph
.
For instance, I define a simple CNN model with two layers, and my goal is to generate CUDA code for a specialized convolutional kernel (let’s say conv_new
) to replace the original convolution based on FX.graph IR.
Should I first build a customized operator in Pytorch as an extension and then replace the original model kernel with the new one? or is there any code gen for FX.graph to generate the kernel directory from templates?