Pass large data into mp.spawn

Hello,

I have a question about Getting Started with Distributed Data Parallel — PyTorch Tutorials 2.4.0+cu121 documentation

There is a large collator and dataset objects that I’m passing to the train function like this:

if world_size > 0:
mp.spawn(train_gpu, args=(world_size, model, collator, dataset),
nprocs=world_size, join=True)

Is it better to create all the large objects inside the function train_gpu() or pass them like above in terms of memory usage?

Thank you!