Hi, I am trying to train several models in parallel using torch 's pool.map()
. Since my setup has multiple GPUs, I pass a device also to my training task and the model is trained on that particular device.
The problem I face is RuntimeError: CUDA error: out of memory
after a while.
This happens after several models are trained and I can clearly see using watch nvidia-smi
that the GPU memory accumulates over time.
I have posted a minimal example below which also leads to the stated issue.
Is there something needs to be done at the end of run()
to clean up the memory?
Pytorch version: torch 1.6.0
import torch
from torch.multiprocessing import Pool
import random
class SimpleModule(torch.nn.Module):
def __init__(self, input_: int, output: int):
super().__init__()
self.linear = torch.nn.Linear(input_, output)
def forward(self, input_: torch.Tensor):
return self.linear(input_)
def run(device: str):
model = SimpleModule(10, 10).to(device)
optimizer = torch.optim.SGD(lr=1e+2,
params=model.parameters())
loss_criterion = torch.nn.MSELoss()
for i in range(int(1e+4)):
optimizer.zero_grad()
inputs = torch.rand(5, 10).to(device)
outputs = torch.rand(5, 10).to(device)
preds = model.forward(inputs)
loss = loss_criterion(preds, outputs)
loss.backward()
optimizer.step()
if __name__ == "__main__":
torch.multiprocessing.set_start_method("spawn", force=True)
n_tasks = 500
available_device = ["cpu"] if not torch.cuda.is_available() else [f"cuda:{i}" for i in
range(torch.cuda.device_count())]
devices = random.choices(available_device, k=n_tasks)
pool = Pool(processes=50)
pool.map(run, devices)