Failed CUDA graph capture leaves default stream in invalid state

The following MRE results in an unexpected error:

    stream = torch.cuda.Stream(-1)
        graph = torch.cuda.CUDAGraph()
        with torch.cuda.graph(graph, stream=stream):
    except RuntimeError as e:
    del stream
    x = torch.randn(10, device="cuda")

Naturally, the exception printed is related to synchronize not being something that can be used inside a CUDA graph:

CUDA error: operation failed due to a previous error during capture
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Unexpectedly for me is that the randn line also fails:

>       x = torch.randn(10, device="cuda")
E       RuntimeError: CUDA generator expects graph capture to be underway, but the current stream is not capturing.

AFAIK this happens because of this line here:

When capture fails the stream is left with the “Invalidated” status:

A failed capture should not invalidate the CUDA context. I wonder why the stream used during capture is also being used in torch.randn.
Am I doing something wrong here?
If so, how can I recover from a failed capture?

I don’t think this is the case here as it seems an internal Runtime error is raised assuming that the capture might still be ongoing due to it failed status.
What’s your use case assuming a capture fails as I don’t think it’s easy to recover from internal asserts?

Pytorch not supporting recovering from an unsuccessful capture is good enough for me. I just wanted to make sure this was by design.
Regarding my use case, I was thinking on having the possibility of falling back to eager mode if the capture of a module was unsuccessful.

OK, I see. Unfortunately, the error looks like a valid CUDA assert and might thus corrupt the CUDA context so cannot be recovered.