AOT-inductor run_impl uses a process-wide allocator: can two AOTIModelPackageLoader workers in the same process break each other’s CUDA Graphs?

After inspecting the generated xxx.wrapper.cpp I see something like:

static constexpr int64_t int_array_1[] = {1L, };
AtenTensorHandle pool2_handle;
AOTI_TORCH_ERROR_CODE_CHECK(
    aoti_torch_empty_strided(
        1, int_array_0, int_array_1,
        cached_torch_dtype_uint8,
        cached_torch_device_type_cuda,
        this->device_idx_,
        &pool2_handle));
RAIIAtenTensorHandle pool2(pool2_handle);
// … later alloc_from_pool calls …
pool2.reset();

After loader1 finishes capturing its CUDA graph, loader2 may still allocate the same virtual address. If I then run both loaders concurrently on two different streams, could this cause memory corruption or undefined behavior?