Torch.multiprocessing || memory allocator

Hi,
I was interested in using the multiprocessing module.
The given example is this one.

import torch.multiprocessing as mp
from model import MyModel

def train(model):
    # Construct data_loader, optimizer, etc.
    for data, labels in data_loader:
        optimizer.zero_grad()
        loss_fn(model(data), labels).backward()
        optimizer.step()  # This will update the shared parameters

if __name__ == '__main__':
    num_processes = 4
    model = MyModel()
    # NOTE: this is required for the ``fork`` method to work
    model.share_memory()
    processes = []
    for rank in range(num_processes):
        p = mp.Process(target=train, args=(model,))
        p.start()
        processes.append(p)
    for p in processes:
        p.join()

So as far as I can see it opens 4 processes for the train function. Each train function contains a dataloader and so on.
I was wondering how the multiprocessing module deals with the GPU memory. Does it control the memory available at the time of opening each process? or It just try to open them all and if there is not enough memory throws ‘out of memory’ error.

If the 2nd case happens, is there a way to check if there is enough memory available? I guess this can be achieved using torch.cuda.mem_allocated(), but I would like to understand how the "caching memory allocator " works. When an allocator is created? if I call two different processes on the same GPU, there will be memory conflicts? is there a way to share an allocator to avoid that?

Thanks in advance

1 Like