CUDA memory allocation for result tensor

Arpit_Agarwal · November 26, 2024, 5:18pm

How does torch.compile decide on when to call cudaMalloc inside a torch.compile function? I have looked at PT2 paper and searched this issue on github. But I am unable understand the procedure. I am still going through ASPLOS 2024 resources and code to understand the procedure. Any other helpful resources are welcome.