Method 'poll' of 'select.poll' takes ~50% of script time

Hello,

I developed a notebook to classify images with ResNet34. When I came to profiling (by the magic call %%prun for a single epoch) I see this:

Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      478   34.345    0.072   34.345    0.072 {method 'poll' of 'select.poll' objects}
    17208    4.272    0.000    4.272    0.000 {built-in method conv2d}
      179    3.846    0.021    3.846    0.021 {method 'run_backward' of 'torch._C._EngineBase' objects}
    18164    3.034    0.000    3.034    0.000 {built-in method batch_norm}
     1420    2.723    0.002    2.723    0.002 {method 'to' of 'torch._C._TensorBase' objects}
    63366    2.361    0.000    2.361    0.000 {method 'add_' of 'torch._C._TensorBase' objects}
      179    2.177    0.012    8.790    0.049 adam.py:49(step)
    42244    1.678    0.000    1.678    0.000 {method 'mul_' of 'torch._C._TensorBase' objects}
    18164    1.258    0.000    5.208    0.000 batchnorm.py:84(forward)
     7648    1.092    0.000   12.808    0.002 resnet.py:57(forward)

The first line is quite strange for me, what does “select.poll” represent?
As a note, I am using the following data source:

def get_split(dataset, batch_size, num_workers):
    targets = dataset.targets
    train_idx, valid_idx = train_test_split(np.arange(len(targets)),
                                            test_size=0.2,
                                            shuffle=True,
                                            stratify=targets,
                                            random_state=42)
    tr_trans = torchvision.transforms.Compose([
            torchvision.transforms.CenterCrop(224),
            torchvision.transforms.RandomHorizontalFlip(),
            torchvision.transforms.RandomVerticalFlip(),
            torchvision.transforms.ColorJitter(contrast=0.3,brightness=0.3,saturation=0.1,hue=0.1),
            torchvision.transforms.ToTensor(),
            torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
            AddGaussianNoise(0.15, 0.00001),
        ])
    vl_trans = torchvision.transforms.Compose([
            torchvision.transforms.CenterCrop(224),
            torchvision.transforms.ToTensor(),
            torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        ])
    
    tr_dataset = torch.utils.data.Subset(dataset, train_idx)
    val_dataset = torch.utils.data.Subset(dataset, valid_idx)
    
    tr_dataset_tf = MapDataset(tr_dataset, tr_trans)
    val_dataset = MapDataset(val_dataset, vl_trans)
    
    train_loader = torch.utils.data.DataLoader(tr_dataset_tf, batch_size=batch_size, num_workers=num_workers, pin_memory=False)
    valid_loader = torch.utils.data.DataLoader(val_dataset,   batch_size=batch_size, num_workers=num_workers, pin_memory=False)

    return train_loader , valid_loader

def get_dataloader(dataset_name, batch_size, num_workers=4):
    if dataset_name == "train":
        trans = None
    else:
        trans = torchvision.transforms.Compose([
            torchvision.transforms.CenterCrop(224),
            torchvision.transforms.ToTensor(),
            torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
        ]) 

    dataset = torchvision.datasets.ImageFolder(
        root=image_path + dataset_name,
        transform=trans
    )

    if dataset_name == "train":
        train , validation2 = get_split(dataset, batch_size, num_workers)
        return train , validation2
     else:
        loader = torch.utils.data.DataLoader(
            dataset,
            batch_size=batch_size,
            num_workers=num_workers,
            shuffle=False,
            pin_memory=False
            )
    return loader

The profiling above is referred to the following configuration:

{
"num_workers": 4,
"pin_memory": False
}

After some trials, I noticed that decreasing num_workers the time “poll” takes goes up (as I imagined). Moreover, I noticed when I set pin_memory=True the output of the profiler becomes:

Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     3286   42.325    0.013   42.325    0.013 {method 'acquire' of '_thread.lock' objects}
      179    3.636    0.020    3.636    0.020 {method 'run_backward' of 'torch._C._EngineBase' objects}
    17208    3.524    0.000    3.524    0.000 {built-in method conv2d}
    18164    2.694    0.000    2.694    0.000 {built-in method batch_norm}
    63366    2.333    0.000    2.333    0.000 {method 'add_' of 'torch._C._TensorBase' objects}
      179    2.151    0.012    8.582    0.048 adam.py:49(step)
    42244    1.605    0.000    1.605    0.000 {method 'mul_' of 'torch._C._TensorBase' objects}
    18164    1.125    0.000    4.501    0.000 batchnorm.py:84(forward)

Any ideas of why this happens?
Thank you all

1 Like