How to collect tensors in all gpus for each batch and save them

Hi, I want to collect tensors in all GPUs for each minibatch and save them. Can someone suggest how to do that?

1 Like

If you are using DDP (DistributedDataParallel()) then you can simply save them like you do without DDP (using torch.save), because every process (i.e. GPU) will run it. Use gpu index to prevent multiple saving
of the same name

I want to collect tensors in all GPUs for each minibatch and save them.

Do you want all tensors to be on a single process before saving?

You can save a tensor using torch.save — PyTorch 2.1 documentation.

Yes, DDP is fine. The tensors can stay on GPUs. Each tensor should be saved in a file, and I want to make sure I save them without repetition or missing a tensor. Can you suggest a sample code?

To avoid repetition you can use gpu index when saving files. Something like the following:

def save_tensors(gpu, total_gpus):
    torch.cuda.set_device(gpu)
    torch.distributed.init_process_group(backend = 'nccl', init_method='env://', world_size=total_gpus, rank=gpu)
    for i in range(100):
        tensor = torch.rand(2,3)
        torch.save(tensor, f'tensor_{i}_gpu_{gpu}.pt')
def main():
    gpu_count = torch.cuda.device_count()
    os.environ['MASTER_ADDR'] = '127.0.0.1'
    os.environ['MASTER_PORT'] = '1234'
    torch.multiprocessing.spawn(save_tensors, nprocs = gpu_count, args = (gpu_count, ))