Dataset, DataLoader and its transformations: PIL vs Tensor, which one is faster?

eduardo4jesus · February 12, 2021, 1:58am

What I know, or at least what I think that I know

When instantiating one of the VisionDataset from torchvision.datasets, such as the MNIST, a transform argument can be provided which can be built using torchvision.transforms.Compose with a list of transformations.
If the argument transformation is provided to the particular chosen VisionDataset, the transform don’t seem to be used immediately by the object. But instead, it seems to be used only when we have a torch.utils.data.DataLoader iterating through the dataset.
I noticed from the documentation page on transforms that there are transforms that work on top of PIL objects, Tensor objects and both.
It seems that only Tensor objects can be transferred to the gpu, not PIL objects.

Questions I have

Is it the case that it is better to have the transforms working only on Tensors instead of working on PIL objects in terms of execution time?
Would those transforms using Tensor be operated in the GPU and the ones using PIL only on CPU?
Is one type of input for the transform better than the other? If so, why having two types of transform?
Additionally, I am setting the the pin_memory to True. I am not sure what is a batch of a custom type, but I don’t think that this is my case, so I am assuming that the pin_memory is actually taking effect and making the transfer memory -> gpu_memory faster. Is that the case? If most of the out of the box network examples fall in this categories, why pin_memory is default to False. Am I missing something on here?

Thanks.

JuanFMontesinos · February 12, 2021, 10:22am

The trick here is that there is a drop-replacement library of PIL I don’t recall the name right now. It’s pillow sidm or ssdm or something similar. That library is compiled and it optimizes the performance depending on the instructions available for ur cpu.

One of the drawbacks of the pytorch’s dataloder is that it cannot run multiprocessing together with gpu processing. That makes the answer a bit hard. It’s a trade-off which depends on the dataset, which GPU you have, which cpu… I would say that a good practice (at least to me) is trying to split the dataloading in a cpu-intensive + a gpu-intensive stages.

If there is something which is really worth to run in the GPU, for example, stft or really heavy ops, I just apply them after the batch is loaded.

But once again, everything depends on your computer, the type of data ecetera

eduardo4jesus · February 16, 2021, 9:42pm

Thanks for the reply.

Nice! Good to know.

Is there a way to easily determine this?

eduardo4jesus · February 16, 2021, 9:45pm

I have an additional question.

In what stage are the transforms applied?

As we get the DataLoader object; or
As we get each batch, which would apply the transform every time that batch is selected.

eduardo4jesus · February 17, 2021, 12:09am

I noticed the transforms are applied as we iterate over the data with DataLoader.

JuanFMontesinos · February 17, 2021, 11:27am

Im not really familirized with the prebuilt datasets.
For a general dataset,
the workload has to be allocated in the __getitem__ func.
This is the function which is called and parallelized by the dataloader.
As I aforementioned, it works only with cpu computations. Therefore any other transformation should be applied after the batch is conformed.