How to jit compile with `cupy.cuda.compile_with_cache`

I’m trying to compile PF-AFN model to jit.
I got the following error when I compile the model with cupy.cuda.compile_with_cache to jit.

NVRTCError: NVRTC_ERROR_COMPILATION (6)

During handling of the above exception, another exception occurred:

CompileException                          Traceback (most recent call last)

cupy/util.pyx in cupy.util.memoize.decorator.ret()

/usr/local/lib/python3.7/dist-packages/cupy/cuda/compiler.py in compile(self, options)
    440         except nvrtc.NVRTCError:
    441             log = nvrtc.getProgramLog(self.ptr)
--> 442             raise CompileException(log, self.src, self.name, options, 'nvrtc')
    443 
    444 

CompileException: /tmp/tmpan1ut480/3b7c153ce98d06488f1cbac8793f6dff_2.cubin.cu(16): error: identifier "tensor" is undefined

1 error detected in the compilation of "/tmp/tmpan1ut480/3b7c153ce98d06488f1cbac8793f6dff_2.cubin.cu".

To Reproduce

This is a colab to reproduce the error.

This is a minimum code.

@cupy.util.memoize(for_each_device=True)
def cupy_launch(strFunction, strKernel):
	return cupy.cuda.compile_with_cache(strKernel).get_function(strFunction)

kernel_Correlation_rearrange = " .... "

import torch
import torch.nn as nn

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()

    def forward(self, x_warp_after, x_cond):
        cupy_launch('kernel_Correlation_rearrange', cupy_kernel('kernel_Correlation_rearrange', {
          'intStride': 1,
          'input': x_warp_after,
          'output': x_cond
        }))(
        )
        return x_warp_after, x_cond

net = Net().cuda()
input1 = torch.randn([1, 256, 8, 6]).cuda()
input2 = torch.randn([1, 256, 8, 6]).cuda()
trace_model = torch.jit.trace(net, [input1, input2])

Expected behavior

I think the above error occurs when I use cupy.cuda.compile_with_cache.

Environment

  • PyTorch Version (e.g., 1.0): 1.8.1+cu101
  • OS (e.g., Linux): Ubuntu 18.04.5 LTS (x86_64)
  • How you installed PyTorch (conda, pip, source): pip
  • Build command you used (if compiling from source): no
  • Python version: 3.7 (64-bit runtime)
  • CUDA/cuDNN version: 11.0.221
  • GPU models and configuration: GPU 0: Tesla T4
  • Any other relevant information:

Based on the error message it seems that cupy is unable to compile the PyTorch methods and I’m unsure if this would even be supported.
Do you have any resources claiming that this should work and some examples demonstrating it?

@ptrblck Thank you for your reply!
You can check this example from the following colab.

Thanks for the code! It shows the error, which might be helpful for debugging, but my previous question was regarding the expectations that this would be supported. Do you have any working example, demos, blog posts etc., which explain how cupy can be used, as I’m unfamiliar with it?