Do you have a script, or ideas on how to write one, to monitor the memory of all the data loading processes for educational purpose?
I tried adapting this blog article and code:
import torch
from torch.utils.data import Dataset, Subset, DataLoader
from common import MemoryMonitor
class ToyDataset(Dataset):
def __init__(self, shape):
self.data = torch.zeros(shape)
def __len__(self):
return self.data.shape[0]
def __getitem__(self, idx):
return self.data[idx]
shape = ... # tuple, (num samples, height, width)
dataset = ToyDataset(shape)
dataloader = DataLoader(dataset, batch_size=32, num_workers=4, persistent_workers=True)
it = iter(dataloader)
monitor = MemoryMonitor()
[monitor.add_pid(w.pid) for w in it._workers]
print(f"Single Dataset, {dataloader.num_workers} workers\n", monitor.table())
From the blog above:
By definition, we should use total PSS to count the total RAM usage of N processes.
For the same number of worker the memory used by each worker scales with the size of the data as expected.
I’m not sure how to interpret the outputs when keeping the size of the dataset constant and varying the number of workers e.g. from num_workers=2
and num_workers=4
:
# For a dataset of shape (512, 1024, 1024) and batch size 32
Single Dataset, 2 workers
time PID rss pss uss shared shared_file
------ ----- ----- ------ ------ -------- -------------
21097 98405 2.4G 930.6M 157.1M 2.3G 43.4M
21097 98598 2.4G 889.7M 119.4M 2.3G 31.0M
21097 98600 2.4G 906.9M 136.3M 2.3G 31.0M
# For a dataset of shape (512, 1024, 1024) and batch size 32
Single Dataset, 4 workers
time PID rss pss uss shared shared_file
------ ------ ----- ------ ------ -------- -------------
21996 102287 2.4G 620.7M 158.5M 2.2G 43.4M
21996 102488 2.4G 583.5M 126.2M 2.2G 31.3M
21996 102490 2.4G 598.1M 140.6M 2.2G 31.5M
21996 102492 2.3G 547.9M 90.3M 2.2G 31.6M
21996 102494 2.3G 566.1M 108.3M 2.2G 31.6M
The memory usage per worker seems to decrease with increasing number of workers. Do you know if this is expected or if it’s a issue with my example?