Is it safe to create new gpu tensor during cuda graph capturing?

I learned that when capturing a CUDA Graph, if a new tensor is created, the graph only records its memory address. During the replay phase, the graph will operate on that same address. Is there a risk of accessing an invalid address (if it wasn’t properly allocated) or overwriting memory belonging to other variables? Or does PyTorch’s internal memory pool (caching allocator) guarantee that these addresses remain valid and reserved?

import torch

static_tensor = torch.arange(5, device='cuda:0')
g = torch.cuda.CUDAGraph()
with torch.cuda.graph(g):
    static_tensor.copy_(torch.arange(1, 1+5, device='cuda:0')) # create a new tensor here
g.replay()

According to 4.2. CUDA Graphs — CUDA Programming Guide , memory allocation during cuda capturing will be recorded as a memory node:

Graph allocations have fixed addresses over the life of a graph including repeated instantiations and launches. This allows the memory to be directly referenced by other operations within the graph without the need of a graph update, even when CUDA changes the backing physical memory. Within a graph, allocations whose graph ordered lifetimes do not overlap may use the same underlying physical memory.

So I think it’s safe to create new gpu tensor because the lifetime will be managed by cuda and pytorch memory pool.

PyTorch uses a custom pool for CUDA graph allocations specifically for this reason, so just to confirm: yes it is safe to create new tensors during capture

Yes — it is safe to create new CUDA tensors during graph capture in PyTorch, as long as:

  • You are using the default CUDA caching allocator (the normal PyTorch behavior).
  • The allocation happens inside the capture region.

In this case, PyTorch and the caching allocator ensure that the virtual addresses remain valid across graph replays.


For your example specifically:

static_tensor = torch.arange(5, device='cuda:0')

This tensor is allocated outside the graph capture, so its lifetime is NOT managed by the graph. You are responsible for keeping it alive and ensuring it is not freed or resized while the graph is still in use.


Regarding this statement:

memory allocation during cuda capturing will be recorded as a memory node

This refers to CUDA graph-level memory nodes, which are created when capturing cudaMallocAsync.

By default, PyTorch does NOT use cudaMallocAsync. It uses its own CUDA caching allocator. If you want PyTorch allocations to be backed by cudaMallocAsync, you must explicitly enable it. (see: https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-alloc-conf)

However, this is generally not recommended unless you fully understand it. The default caching allocator already provides correct and safe behavior for CUDA graph capture in typical use cases.