I learned that when capturing a CUDA Graph, if a new tensor is created, the graph only records its memory address. During the replay phase, the graph will operate on that same address. Is there a risk of accessing an invalid address (if it wasn’t properly allocated) or overwriting memory belonging to other variables? Or does PyTorch’s internal memory pool (caching allocator) guarantee that these addresses remain valid and reserved?
import torch
static_tensor = torch.arange(5, device='cuda:0')
g = torch.cuda.CUDAGraph()
with torch.cuda.graph(g):
static_tensor.copy_(torch.arange(1, 1+5, device='cuda:0')) # create a new tensor here
g.replay()