Hello there,
According to the following torchvision release transformations can be applied on tensors and batch tensors directly. It says:
torchvision transforms are now inherited from nn.Module and can be torchscripted and applied on torch Tensor inputs as well as on PIL images. They also support Tensors with batch dimension and work seamlessly on CPU/GPU devices
Here a snippet:
import torch
import torchvision.transforms as T
transforms = torch.nn.Sequential(
T.RandomCrop(224),
T.RandomHorizontalFlip(p=0.3),
T.ConvertImageDtype(torch.float),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
)
tensor_image = torch.randint(0, 256, size=(3, 256, 256), dtype=torch.uint8)
# works directly on Tensors
out_image1 = transforms(tensor_image)
# on the GPU
out_image1_cuda = transforms(tensor_image.cuda())
# with batches
batched_image = torch.randint(0, 256, size=(4, 3, 256, 256), dtype=torch.uint8)
out_image_batched = transforms(batched_image)
How can we utilize this fact to improve our dataloaders performance.
Loading and Transforming ImageNet images is taking on my computer approx 5s to load a 256-images batch. Given the fact that I will move the input tensors to device anyway why not apply transformations after doing this step?
The typical work flow is to define a transformation composition and set it up for a torch dataset. The composition transforms PIL images to tensors in the final step as following:
import torchvision.transforms as T
transform = T.compose([
T.Resize(...),
T.Crop(....),
T.Flip(...),
# And finally
T.ToTensor(),
])
This doesn’t make use of the fact that transformations can be applied on device tensors directly.
My thought is to add transformation statement inside the data loading loop as following:
for i,(input,target) in enumerate(dataloader):
# move data to GPU
input = input.to(device)
target = target.to(device)
# APPLY transformations
input = transform(input)
# feed to model
output = model(input)
However this approach is not clean and seamless as simply setting a transformation to a dataset. Though I am afraid it is the only possible approach.
I am not sure if this approach will even speed up loading. Will it?
Any thoughts?