Cuda.graph with embedding triggering "operation not permitted when stream is capturing"

Thank you for bearing with me. I learned that my small repro was not actually replicating the problem I thought it was. Two things:

  • I was running cuda-memcheck trying to figure out where things went sideways. I was running with PYTORCH_NO_CUDA_MEMORY_CACHING=1, which seemed to create problems by preventing caching, causing mallocs within the graph, which is problematic. Should not use that!
  • I was running the code in a python shell, but that actually made things worse since python wanted to print out the tensors, which meant the GPU->CPU copy was also illegal in the cuda graph.

If I run as a python script (not repl), without PYTORCH_NO_CUDA_MEMORY_CACHING=1, then yes the listed code works just fine.

The real bug I was trying to repro, seems to be due to running two streams within a single graph which pytorch made a bit trickier as-of-that-commit, and is entirely unrelated to the F.embedding red herring I found myself chasing down after using the debugging tools inappropriately.

2 Likes