How to properly lock a dataset when using multiple workers in dataloading?

Hello,

I’m facing an issue when writting my custom dataset object that I think is linked to the way multiprocessing is implemented with dataloading.

My use : my dataset object opens a binary file handler on a very large single file and every time I next to return the i-th sample, I move the file handler with seek, and read the corresponding number of bytes.

As the file handler is built in the constructor, my initial belief was that I needed to protect the “seek/read” part of my code with a multiprocessing Lock but I have the impression that I’m not doing it properly or that this is not the right way to do so ;

I could provide the code but it is kind of long and dependent on the structure of my binary data format but in pratice, the layout is the following :

import os
import struct
from multiprocessing import Lock

import torch.utils.data as data


class Dataset(data.Dataset):

    def __init__(self, filepath):
        super().__init__()
        self.fp = open(filepath, 'rb')
        self.fplock = Lock()
        self.row_format = '<?ifff'
        self.row_size = struct.calcsize(self.row_format)

    def __getitem__(self, idx):
        file_offset = ..... # computed as a function of idx
        with self.fplock:
             self.fp.seek(file_offset, os.SEEK_SET)
             row = self.fp.read(self.row_size)
             values = struct.unpack(self.row_format, row)
       return values

    def __len__(self):
        return xxxxx 

dataset = Dataset()
loader = DataLoader(dataset, batch_size=xxxx, num_workers=7) 

for X in loader:
      ...

Isn’t the correct to proceed ? When I set num_workers=1, the data are read correctly but If I set num_workers > 1, I can see that my data are not always correctly decoded; My feeling is that the multiprocesses are interfering with the seek/read although I protect the critical section with a multiprocessing Lock ;

Thank you for your help ;

Jeremy./

did you figure out a solution? I am having the same problem and am starting to suspect that Locks simply don’t work with torch

Sorry for the late reply , I did not see your message.

Actually no. I have not found any solution.

The thing I had in mind and that I wanted to test was to open the file in the init_worker_fn .

Actually, it is while now but my idea was that every worker could have its own file handler rather than having one global that is locked; But I did not try that yet ;

Jeremy.