Is emptying the cache a good practice when dealing with huge data from the Dataloader?

RKnowledge · July 24, 2023, 10:52am

Thank you a lot for your answers. I am able to fit for high batch sizes (1028) my maximum length sequences (around 800). For now I’ll stick to a small batch size so that I avoid this effect of having huge different intermediate activations cached and different batches sizes too.

Can I ask you one last question please, is there any way to the inner workings of the memory management please? I couldn’t understand well just from PyTorch’s documentation.
I have run the following loop on my smaller dataset, with a variable max sequence length, just to iterate fast and see how that will affect the allocated and cached memory:

print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)
print()

for batch in data_loader:
  examples, labels = batch
  examples = torch.squeeze(examples)

  print(examples.size())

  examples = examples.to(device)

  print(torch.cuda.memory_allocated()/1024**2)
  print(torch.cuda.memory_cached()/1024**2)
  print("-----")

And I got the following

0.0
0.0
/usr/local/lib/python3.10/dist-packages/torch/cuda/memory.py:416: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(

torch.Size([4092, 21])
0.65576171875
2.0
-----
torch.Size([4092, 18])
1.2177734375
2.0
-----
torch.Size([4092, 19])
1.1552734375
2.0
-----
torch.Size([4092, 21])
1.2490234375
2.0
-----
torch.Size([4092, 23])
1.3740234375
2.0
-----
torch.Size([4092, 30])
1.6552734375
2.0
-----
torch.Size([4092, 35])
2.02978515625
22.0
-----
torch.Size([357, 48])
1.06787109375
22.0
-----

I would love tu understand why the allocated memory decreased going from torch.Size([4092, 35]) to torch.Size([357, 48]) and I’d love to be able to compute by myself when the cached memory will increase (like why at torch.Size([4092, 35]) and not torch.Size([4092, 30])). (All batches have the same data type torch.int64)

Thank you a lot again, it feels satisfying to be able to pinpoint the issue ^^