Multiprocessing loading a lot of files is stucked in the traning loop

xwjBupt · February 17, 2023, 7:19am

Hi, the thing is that i use multiprocessing to load my training samples into the RAM in the init function of my dataset, when i test my dataset, everything works just fine, but when i use the dataset in my training loop, it stucks in the preloading files into RAM, it do not report errors, just stuck. The code is shown as following:

from torch.utils import data
from multiprocessing import Pool
import tqdm

class MYDATA(data.Dataset):
    def __init__(
        self,
        datadir
        **kwargs
    ):
    self.samples = len(datadir)
    self.content = preload(self.samples)
   def __len__(self):
         return len(self.samples)
   def __getitem__(self,index):
        return self.content[index]

def preload(sampledirs):
   with Pool(4) as p:
                  tmp_content = list(
                      tqdm.tqdm(
                          p.imap(loading_function, sampledirs),
                          total=len(sampledirs),
                          desc="Pre loading views to ram with %d thread >>> " % 4,
                          ncols=100,
                      )
                  )
    return tmp_content

what is wrong? thank you so much!!

nkdatascientist · February 17, 2023, 7:30am

Hi @xwjBupt,
I think you need to use list manager from multiprocessing as show below, Can you test this code to see if it works for you?

from torch.utils import data
from multiprocessing import Pool, Manager
import tqdm

class MYDATA(data.Dataset):
    def __init__(
        self,
        datadir,
        **kwargs
    ):
        self.samples = len(datadir)
        with Manager() as manager:
            self.content = manager.list()
            self.content += preload(datadir)
    
    def __len__(self):
        return self.samples
    
    def __getitem__(self, index):
        return self.content[index]

def preload(sampledirs):
    with Pool(4) as p:
        tmp_content = list(
            tqdm.tqdm(
                p.imap(loading_function, sampledirs),
                total=len(sampledirs),
                desc="Preloading views to RAM with %d threads >>> " % 4,
                ncols=100,
            )
        )
    return tmp_content

xwjBupt · February 17, 2023, 10:44am

Thank you so much for replying, but it does not help and is still the same