Pass large data into mp.spawn

Tigran_Ishkhanov · September 4, 2024, 6:13pm

Hello,

There is a large collator and dataset objects that I’m passing to the train function like this:

if world_size > 0:
mp.spawn(train_gpu, args=(world_size, model, collator, dataset),
nprocs=world_size, join=True)

Is it better to create all the large objects inside the function train_gpu() or pass them like above in terms of memory usage?

Thank you!