Num_workers > 0 incorrect training

I am on Windows 11, PyTorch 1.13.1. When I set num_workers > 0, training is faster. However, faster is one thing, and correct is another. Loss is not decreasing, and the results are a mess.
If I set num_workers = 0 (the only code change), training is correct.

I am using a custom Dataset:

class DefaultDataset(Dataset):
    def __init__(self,                  
                 inputLoader: InputLoader,
                 settings: Settings):
        self._i = 0   
        self._curIndex = -1
        self.inputLoader = inputLoader
        self.settings = settings   
    def setData(self, res: DataForEval, data):
        res.setInput(torch.as_tensor(data, dtype=torch.float))


    def getSize(self) -> int:
        return self.inputLoader.getSize()
    def createData(self, data):
        res = DataForEval()
        self.setData(res, data)

        return res

    def getProgress(self):
        return self._i / len(self)


    def __len__(self):
        return self.getSize()


    def __getitem__(self, index):
        self._i += 1
        self._curIndex = index

        if (self.settings.cache is not None):
            data = self.settings.cache.getData(index)
            if (data is not None):
                return data.toDict()

        data = self.inputLoader.getData(index)
        res = self.createData(data)

        if isinstance(res, DataForEval):
            if (self.settings.cache is not None):
                self.settings.cache.addData(index, res)

            return res.toDict()
        return res

InputLoader is my class, that load raw data from files. DataForEval is a wrapper around a dictionary to have some helper methods, and cache is again a dictionary wrapper that stores loaded data under key (index).

Based on your code snippet it seems you are using a few custom classes with some kind of caching mechanism.
Could you describe how these interact with Python’s multiprocessing, as I would guess the issue might be coming from the assumption that all object are accessed from a single thread/process?

This is getData method I have inside InputLoader

def getData(self, index) -> Tuple[torch.Tensor, torch.Tensor]:

        fileId = self.fileIds[index]
        fileName = self.fileNames[index]
        imgTensor = ImageUtils.loadImageAsTensor(fileName,
        mask = ImageUtils.loadImageAsTensor(f"{self.dataRoot}/{fileId}.png",
        mask[mask > 0.5] = 1.
        mask[mask <= 0.5] = 0.

        return imgTensor, mask

transforms are created from transforms.Compose, ImageUtils.loadImageAsTensor loads image from file using Pillow library.

fileIds and fileNames are filled from __init__ method by iterating the directory structure and putting file names in it.

I don’t see any sync problem here.

Cache is during the first epoch used only to add data, they are not read. And during the first epoch, the loss is already oscillating and not training.

DataForEval class methods:

class DataForEval:

   def __init__(self):        = {}

   def setInput(self, input: torch.Tensor):['input'] = input

    def setTarget(self, target: torch.Tensor):['target'] = target

    def setAdditionalData(self, name: str, data: torch.Tensor):[name] = data

    def setDataIndex(self, index: int):['i'] = index

    def toDict(self) -> Dict:  

I have found the issue. DefaultDataset was holding a reference to the optimizer via Settings class instance. During multiprocessing, it zeroed out all weights in the model, so it was not training.
Removing reference to optimizer from code used in threads solved the issue.