Debugging PyTorch CUDA Extensions

Are there tools / best practices for mixed python / cpp / cuda debugging for torch CUDA Extensions (extensions built with CUDAExtension)? I.e., I’d like to run / debug a PyTorch program from Python and then step into a debugger such as cuda-gdb when the extension is called.

Any recommendations greatly appreciated!

You could launch your script with cuda-gdb and set a breakpoint in your extension.