What is accumulating in /tmp/torchinductor_{USER}/triton?

danbochman · October 4, 2024, 10:19am

Hi,

My training crashed because I ran out of disk space. After further inspection it seems like there are hundreds of GBs stored in my /tmp/torchinductor_azureuser/triton

For context — I am training my model with torch.compile and DDP

What is being stored there? Is there any way for me to prevent running out of storage in long (weeks) training runs?

TIA

ptrblck · October 7, 2024, 10:56pm

This folder is the default cache location according to this code snippet.