GPU L1/L2 cache utilization

Sirish_Gambhira · January 16, 2025, 4:13am

Hi all,

I was wondering if there’s any way we can visualize GPU cache occupancy while training / inferring a model in pytorch. I came across some commands such as torch.cuda.memory._record_memory_history(max_entries=100000), but I am assuming this will not partition the memory into global and shared (or cache). I am interested to know if there are any ways to understand fine-grained GPU memory allocation. Thank you!