Bottleneck on "decode", "resize", and "read"

Hello.

I’m training a regression task (output values between 0 and 100), and the inputs are images from plants. I’m using resnet18 here from torchvision.

I realized the GPU is going in maximum to ~40% utilization, but the problem is that it usually stays at 0%.
So I thought there was a bottleneck in the data loading/preprocessing steps, and used python3 -m torch.utils.bottleneck src/train.py to check.

Here are the results:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   121039  570.056    0.005  570.056    0.005 {method 'decode' of 'ImagingDecoder' objects}
    26068  180.834    0.007  180.834    0.007 {method 'resize' of 'ImagingCore' objects}
  1102194  104.741    0.000  104.741    0.000 {method 'read' of '_io.BufferedReader' objects}
    26068   14.408    0.001   14.408    0.001 {built-in method PIL._imaging.new}
    28838    7.336    0.000    7.336    0.000 {method 'to' of 'torch._C._TensorBase' objects}
     8160    5.703    0.001    5.703    0.001 {built-in method torch.conv2d}
    26071    4.836    0.000    4.836    0.000 {built-in method io.open}
    66454    4.492    0.000    4.492    0.000 {method 'item' of 'torch._C._TensorBase' objects}
      816    4.220    0.005    4.220    0.005 {built-in method torch.stack}
    26068    2.800    0.000    2.800    0.000 {method 'contiguous' of 'torch._C._TensorBase' objects}
    26068    2.656    0.000    2.656    0.000 {method 'close' of '_io.BufferedReader' objects}
      326    2.234    0.007    2.234    0.007 {method 'run_backward' of 'torch._C._EngineBase' objects}
    26068    2.161    0.000    2.161    0.000 {method 'div' of 'torch._C._TensorBase' objects}
    52136    1.247    0.000  590.483    0.011 /home/igorf/.conda/envs/my-env/lib/python3.8/site-packages/PIL/ImageFile.py:155(load)
    52136    0.999    0.000    4.623    0.000 /home/igorf/.conda/envs/my-env/lib/python3.8/site-packages/pandas/core/internals/managers.py:1027(fast_xs)

The “cumtime” shows high values for decode, resize, and read, but I don’t know where this decode is in my code. For resize I suppose it comes from torchvision's Resize() and for read it must be from PIL.

I would like to understand how I can make my model run faster from this output.

My dataset is as follows:

from PIL import Image

from torch.utils.data import Dataset
from torchvision import transforms


class CGIARDataset(Dataset):
    def __init__(self, df, transform=None):
        self.df = df
        self.transform = transform

    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, idx):
        y = self.df.iloc[idx]['extent']
        img = Image.open(self.df.iloc[idx]['filename'])
        x = transforms.ToTensor()(img)
        if self.transform is not None:
            x = self.transform(x)
        return x, y

As you can see, I’m reading the image using PIL, and converting to tensor using ToTensor from torchvision.

The resize step is in my transform object:

transform = transforms.Compose([
    transforms.Resize(IMG_SIZE, antialias=True)
])

Could anyone give me some tips on this?
For example, should I change my image reading from PIL to another library?
Or where does the decode come from?

Thanks!

If multiple workers in the DataLoader don’t help you could replace PIL with PIL-SIMD which could accelerate the decoding and transformation steps.

2 Likes

Thanks for your reply!

I installed PIL-SIMD as recommended here but I got the following error when running my training script:

Traceback (most recent call last):
  File "src/train_coral.py", line 9, in <module>
    from torchvision import transforms
  File "/home/igorf/.conda/envs/my-env/lib/python3.8/site-packages/torchvision/__init__.py", line 6, in <module>
    from torchvision import datasets, io, models, ops, transforms, utils
  File "/home/igorf/.conda/envs/my-env/lib/python3.8/site-packages/torchvision/datasets/__init__.py", line 1, in <module>
    from ._optical_flow import FlyingChairs, FlyingThings3D, HD1K, KittiFlow, Sintel
  File "/home/igorf/.conda/envs/my-env/lib/python3.8/site-packages/torchvision/datasets/_optical_flow.py", line 10, in <module>
    from PIL import Image
  File "/home/igorf/.conda/envs/my-env/lib/python3.8/site-packages/PIL/Image.py", line 89, in <module>
    from . import _imaging as core
ImportError: /home/igorf/.conda/envs/my-env/lib/python3.8/site-packages/PIL/_imaging.cpython-38-x86_64-linux-gnu.so: undefined symbol: PyObject_CheckBuffer

Do you know what is the problem?

I’m using an HPC cluster (CentOS 7).

No, unfortunately I haven’t seen this error before and don’t know what might be causing it. Note that I’m using PIL-SIMD myself and didn’t run into any issues.

Did you install with the “AVX2-enabled version” as stated here?

Yes, I pass CC="cc -mavx" to the build command.

I also passed the flag to install the AVX-2-enable version, but didn’t work.

To give you some updates:
I managed to solve the problem by saving resized images to the disk using Image.open(path).resize(IMG_SIZE) from PIL.

Now the training script is reading directly from a folder with resized images and it’s way faster: before it was ~4min per epoch, and now it’s roughly 1min, which helped a lot to experiment with new ideas.
Moreover, the GPU utilization is good now, so I think the resizing step was struggling with my training.

Thanks for your help and I also hope this helps future readers.

1 Like