RAM usage scales with number of GPUs?

For DDP each process manages one GPU by default and have its own dataloader, it does maintain its own copy of model weights and dataset in each process (RAM) as you observed. If you would like to only have one shared dataset across different processes, you can try writing a custom data loader with shared memory, for example you can refer to this post on how to do it How to cache an entire dataset in multiprocessing?

1 Like