Hangs when generating a dataset

Andrew1 · January 20, 2022, 3:13pm

There is such a class for forming a dataset:

class SiameseNetworkDataset(Dataset):

def __init__(self, imageFolderDataset, transform=None, should_invert=True):
    self.imageFolderDataset = imageFolderDataset
    self.transform = transform
    self.should_invert = should_invert

    np.random.seed(123)
    random.seed(123)

    # Defines a set of random logos by index (for each epoch)
    self.random_indexes = np.random.randint(
        len(self.imageFolderDataset.imgs),
        size=int((len(self.imageFolderDataset.imgs)) / Config.train_batch_size) + 1)

def __getitem__(self, index):

    # reset the indexes every epoch
    if index == 0:
        self.random_indexes = np.random.randint(len(self.imageFolderDataset.imgs), size=int(
            (len(self.imageFolderDataset.imgs)) / Config.train_batch_size) + 1)

    # get the index for the current batch
    img0_tuple = self.imageFolderDataset.imgs[self.random_indexes[int(index/Config.train_batch_size)]]

    # we need to make sure approx 50% of images are in the same class
    should_get_same_class = random.randint(0, 1)

    # Search for class by looping random indexes
    if should_get_same_class:
        while True:
            # keep looping till the same class image is found
            img1_tuple = random.choice(self.imageFolderDataset.imgs)
            if img0_tuple[1] == img1_tuple[1]:
                break
    else:
        while True:
            # keep looping till a different class image is found
            img1_tuple = random.choice(self.imageFolderDataset.imgs)
            if img0_tuple[1] != img1_tuple[1]:
                break

    img0 = Image.open(img0_tuple[0])
    img1 = Image.open(img1_tuple[0])
    #img0 = img0.convert("L")
    #img1 = img1.convert("L")

    if self.should_invert:
        img0 = PIL.ImageOps.invert(img0)
        img1 = PIL.ImageOps.invert(img1)

    if self.transform is not None:
        img0 = self.transform(img0)
        img1 = self.transform(img1)

    return img0, img1, torch.from_numpy(np.array([int(img1_tuple[1] != img0_tuple[1])], dtype=np.float32))

def __len__(self):
    return len(self.imageFolderDataset.imgs)

This code is executed instantly:

folder_dataset = torchvision.datasets.ImageFolder(root = “./openlogo_100”)

siamese_dataset = SiameseNetworkDataset(imageFolderDataset = folder_dataset,
transform=transforms.Compose([transforms.Resize((Config.im_w, Config.im_h)), transforms.ToTensor()]))

train_dataloader = DataLoader(siamese_dataset, pin_memory = False,
shuffle=False)

But the execution of this code takes infinity of time:

for batch_idx, samples in enumerate(train_dataloader):
print(batch_idx)
Sometimes in the console I see this: 1, 2, 3 or 1, 2 or 1

But it never goes beyond 3

often the output is generally empty, and at the same time

At the same time, the processor is loaded about 20% and the load does not disappear after the kernel is rebooted

But when I see some number in the console, I stop the execution and run the following code:
np.shape(samples)
In this case, I see the normal part of the dataset as [tensor(img_1), tensor(img_2), tensor(0 or 1)]
What is the problem? I tried to solve it by controlling the parameters num_workers, pin_memory, shuffle, I also tried to do it on a Mac OS computer. Nothing helped.

ptrblck · January 20, 2022, 11:20pm

Remove the while True loops and see if the “hang” is still there or add a debug print message into the loop to see if your script is spinning there.