File gets executed multiple times when using num_workers > 0 in torch.utils.data.DataLoader()

I ran into a rather curious problem while trying to implement this semi-supervised representation learning algorithm

(tl;dr Pretrain a CNN by letting it solve jigsaw puzzles. Each image gets cut into nine tiles. These tiles are permutated with a precalculated permutation and the net has to predict the index in the permutation set)

I am using a custom Permutator object to hold the permutations, which I load from a file.
This object is given to my wrapper class around torchvision.datasets.ImageFolder().

When I set num_workers > 0 in torch.utils.data.DataLoader(), the whole script gets executed multiple times before each epoch.

Here is some pseudo-code of my setup:

from __future__ import print_function, division

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset
from torchvision import datasets, transforms
from pathlib import Path

import permutation
import jigsaw_model


torch.backends.cudnn.benchmark = True
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Define hyperparameters
image_size = (99, 99)
batch_size = 256
n = 3
lr = 10e-3
n_epochs = 100

# Create custom permutation object
# This object holds a list of 100 permutations and can select one at random
permutator = permutation.Permutation(filename=Path("data", "permutations_max_100.csv"))
print("Permutator created.")


# Training transforms
transforms_train = transforms.Compose([
    ... # Normalize the image and cut it into n*n puzzle tiles
])

# Image files
images_train = datasets.ImageFolder(root="my_path/to/images", transform=None)

# This is just a wrapper on torchvision.datasets.ImageFolder(). It loads an image, cuts it into n*n puzzle tiles
#  and permutates the tiles with a permutation obtained from the permutator object
dataset_train = jigsaw_model.JigsawTileDataset(dataset=images_train, transform=transforms_train, n=n,
                                               permutator=permutator)

# If I set num_workers > 0 here, all the above code is executed multiple times
num_workers = 0
loader_train = torch.utils.data.DataLoader(dataset_train, num_workers=num_workers, shuffle=True, pin_memory=True,
                                           batch_size=batch_size)

# Define the model
model = ...

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)

if __name__ == '__main__':
    for epoch in range(n_epochs):
        train(...)
        validate(...)

A minimal working example can be found here.

How often the code is executed can be tracked by the output print("Permutator created.").
It seems to be num_workers+1, which makes sense, since it’s one for the base process and one for each worker.
Nevertheless I am puzzled by this behaviour, since I want to avoid multiple creation of the Permutator object before each epoch, but still I wish to use multiple workers for speed considerations.

Any help on this would be very much appreciated :slight_smile:

I am using the following setup:
Pytorch 0.4
Python 3.6
Windows 7

Wether I run the code from Pycharm or the Anaconda console, the same behaviour is observed.

Here is my output for num_workers=3:

Permutator created.
Permutator created.
Permutator created.
Epoch 0 [0/196]		Loss 4.6297	Acc 1.172	Time 2.37
Epoch 0 [20/196]		Loss 3.9302	Acc 13.672	Time 0.19
Epoch 0 [40/196]		Loss 3.2485	Acc 24.609	Time 0.11
Epoch 0 [60/196]		Loss 2.9858	Acc 32.812	Time 0.11
Epoch 0 [80/196]		Loss 2.6841	Acc 40.625	Time 0.12
Epoch 0 [100/196]		Loss 2.2454	Acc 44.922	Time 0.11
Epoch 0 [120/196]		Loss 2.1016	Acc 48.828	Time 0.11
Epoch 0 [140/196]		Loss 2.0078	Acc 50.781	Time 0.12
Epoch 0 [160/196]		Loss 2.1359	Acc 50.000	Time 0.11
Epoch 0 [180/196]		Loss 2.0075	Acc 49.609	Time 0.11

Finished Epoch 0 in 69.74s	Avg. Loss 2.6975	Avg. Accuracy 38.306

Permutator created.
Permutator created.
Permutator created.
Permutator created.
Epoch 1 [0/196]		Loss 1.6530	Acc 60.547	Time 0.33
Epoch 1 [20/196]		Loss 2.0716	Acc 53.125	Time 0.11
Epoch 1 [40/196]		Loss 1.8582	Acc 56.250	Time 0.12
Epoch 1 [60/196]		Loss 1.7714	Acc 58.594	Time 0.12
Epoch 1 [80/196]		Loss 1.5090	Acc 62.500	Time 0.18
Epoch 1 [100/196]		Loss 1.5322	Acc 63.281	Time 0.11
Epoch 1 [120/196]		Loss 1.4636	Acc 65.234	Time 0.12

You can see that after the first epoch the permutator is created four times (two times for training and two times for validation).

I think it’s related to the multiprocessing behavior in Windows.
You should wrap your whole code in another function and guard it:

def run():
    for epoch in ...
        train(...)


if __name__=='__main__':
    run()

I’m wondering why you won’t get a Broken Pipe error, which is usually thrown in such a case.
Could you try that?

2 Likes

Yeah, I’ll try that and tell you if it worked!

That was it! Thank you very much! :ok_hand:

For anyone that runs into the same issue, I’ll add that I had to wrap the whole code in a function to make the issue disappear. That means, the above code example should be:

def run():
    torch.backends.cudnn.benchmark = True
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    ... 
    # (Load data, define loader and model)

    for epoch in ...
        train(...)


if __name__=='__main__':
    run()
1 Like

Tried but still getting the same error