Multiprocessing dataloader returns incorrect data in first iterations

Dear readers, I have a problem with the torch Dataloader when multiprocessing. I have a dataset that contains Pytorch Geometric DataBatch items. When using a single process, I can train the model easily, however this is slow, therefore I want to multiprocess, but when enabling that, the first returned item per worker is incorrect. All the tensors that the DataBatch contains are set to zero. I illustrate this by only printing the tensor called ‘batch’ in the example below. The code below shows my problem:

from Pointcloud.Modules import Config as config
from Pointcloud.Modules.FileDataset import FileDataset
from torch_geometric.loader import DataLoader as tg_loader_Dataloader

training_dataset = FileDataset(
    config.DATA_DIR,
    dataset_idx=0,
    split_name=config.SPLIT_NAME,
    split_distribution=config.SPLIT
)

print(f"Dataset: {training_dataset}\nExample item: {training_dataset[0]}")

single_processor_dataloader = tg_loader_Dataloader(
    dataset=training_dataset,
    batch_size=16,
    shuffle=False,
    num_workers=0
)
multi_processor_dataloader = tg_loader_Dataloader(
    dataset=training_dataset,
    batch_size=16,
    shuffle=False,
    num_workers=4,
    persistent_workers=True
)

for i, v in enumerate(single_processor_dataloader):
    print(f"Single Processor: idx: {i} --> value: {v}\n    batch value: {v.batch.unique()}")
    if i >= 8:
        break
for i, v in enumerate(multi_processor_dataloader):
    print(f"Multi Processor: idx: {i} --> value: {v}\n    batch value: {v.batch.unique()}")
    if i >= 8:
        break

Output:

Dataset: FileDataset(25008)
Example item: Data(x=[59, 8], edge_index=[2, 357], y=[1, 3])
Single Processor: idx: 0 --> value: DataBatch(x=[812, 8], edge_index=[2, 4902], y=[16, 3], batch=[812], ptr=[17])
    batch value: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
       device='cuda:0')
Single Processor: idx: 1 --> value: DataBatch(x=[738, 8], edge_index=[2, 4472], y=[16, 3], batch=[738], ptr=[17])
    batch value: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
       device='cuda:0')
Single Processor: idx: 2 --> value: DataBatch(x=[752, 8], edge_index=[2, 4546], y=[16, 3], batch=[752], ptr=[17])
    batch value: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
       device='cuda:0')
Single Processor: idx: 3 --> value: DataBatch(x=[762, 8], edge_index=[2, 4608], y=[16, 3], batch=[762], ptr=[17])
    batch value: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
       device='cuda:0')
Single Processor: idx: 4 --> value: DataBatch(x=[776, 8], edge_index=[2, 4732], y=[16, 3], batch=[776], ptr=[17])
    batch value: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
       device='cuda:0')
Single Processor: idx: 5 --> value: DataBatch(x=[768, 8], edge_index=[2, 4632], y=[16, 3], batch=[768], ptr=[17])
    batch value: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
       device='cuda:0')
Single Processor: idx: 6 --> value: DataBatch(x=[750, 8], edge_index=[2, 4576], y=[16, 3], batch=[750], ptr=[17])
    batch value: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
       device='cuda:0')
Single Processor: idx: 7 --> value: DataBatch(x=[750, 8], edge_index=[2, 4512], y=[16, 3], batch=[750], ptr=[17])
    batch value: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
       device='cuda:0')
Single Processor: idx: 8 --> value: DataBatch(x=[777, 8], edge_index=[2, 4691], y=[16, 3], batch=[777], ptr=[17])
    batch value: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
       device='cuda:0')
Multi Processor: idx: 0 --> value: DataBatch(x=[812, 8], edge_index=[2, 4902], y=[16, 3], batch=[812], ptr=[17])
    batch value: tensor([0], device='cuda:0')
Multi Processor: idx: 1 --> value: DataBatch(x=[738, 8], edge_index=[2, 4472], y=[16, 3], batch=[738], ptr=[17])
    batch value: tensor([0], device='cuda:0')
Multi Processor: idx: 2 --> value: DataBatch(x=[752, 8], edge_index=[2, 4546], y=[16, 3], batch=[752], ptr=[17])
    batch value: tensor([0], device='cuda:0')
Multi Processor: idx: 3 --> value: DataBatch(x=[762, 8], edge_index=[2, 4608], y=[16, 3], batch=[762], ptr=[17])
    batch value: tensor([0], device='cuda:0')
Multi Processor: idx: 4 --> value: DataBatch(x=[776, 8], edge_index=[2, 4732], y=[16, 3], batch=[776], ptr=[17])
    batch value: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
       device='cuda:0')
Multi Processor: idx: 5 --> value: DataBatch(x=[768, 8], edge_index=[2, 4632], y=[16, 3], batch=[768], ptr=[17])
    batch value: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
       device='cuda:0')
Multi Processor: idx: 6 --> value: DataBatch(x=[750, 8], edge_index=[2, 4576], y=[16, 3], batch=[750], ptr=[17])
    batch value: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
       device='cuda:0')
Multi Processor: idx: 7 --> value: DataBatch(x=[750, 8], edge_index=[2, 4512], y=[16, 3], batch=[750], ptr=[17])
    batch value: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
       device='cuda:0')
Multi Processor: idx: 8 --> value: DataBatch(x=[777, 8], edge_index=[2, 4691], y=[16, 3], batch=[777], ptr=[17])
    batch value: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
       device='cuda:0')

For index 0 to 3 for the multiprocessor, you can see that the data is different than from the single processor. How is this possible to happen? This causes my model to crash during the first training step.

Try to debug it by adding print statements to the __getitem__ showing the worker id as well as the used index. This should show that each worker is only loading a single sample and you could then debug why that’s the case.

1 Like

Thank you for your response! I tried implementing ‘getitem’ for my FileDataset object (which is an InMemoryDataset from Pytorch Geometric) It only printed statements for the single process Dataloader and I wondered why not for the Multiprocess Dataloader.

However, with a loop I ended up on this page and read the second warning from this section. It turned out that my dataset tensors were stored on the GPU already while loading the tensors from files. I set it to load the tensors on cpu and now it works :slight_smile: I can’t explain the behavior of the Dataloader, but I’happy it works again.