CUDA Graphs with Torch Library API

MatanT · February 21, 2025, 7:06pm

Hi I have an implementation of a kernel in CUDA C++.
I am using it with torch library API in my python code.
Everything works great (the kernel runs on the GPU on regular calls), but when I try to capture a graph with the code, I get a message that the graph is empty (and replay does not runs the kernel of course).
I would apricate any help to know why this is happening, and any advice on how to capture a kernel in PyTorch which is implemented in CUDA C++.
Thanks.

ptrblck · February 22, 2025, 12:07am

I remember seeing a similar issue recently here where a user forgot to pass the surrounding CUDAStream to the custom kernel, but I cannot find the thread right now. Could this be the case here, too?

MatanT · February 23, 2025, 10:54pm

Wow thanks!!! this works now. Why is that the case though?

ptrblck · February 24, 2025, 3:33am

From the docs:

Capture must occur on a non-default stream

which will be set as the default stream in the context manager and has to be passed to custom functions as well.