How to create same size of patches from different size of image dataset?

I try to implement NetVLAD neural network. Before implementing it, i have to do data preprocessing, which is extract 128x128 and 68x68 patch size of image dataset. In order to prevent pixel loss, i have to do image padding first. I looked at Pytorch API and i figured it out.

  1. Using F.pad for padding
  2. Using unfold API to extract patches
  3. write custom dataloader

I know the logic but i dont know how to implement it. Can anyone help me out?

I think you could first preprocess your dataset, extract patches and perform padding by torchvision.transforms.functional.pad (link) and save the extracted and padded data to the disk. Then you could write a custom dataset and dataloader to load your processed data.

import torch
import torchvision.transforms.functional as F
from PIL import Image
import numpy as np
import cv2

img = cv2.imread('1.jpg')
print(img.shape) # 483x850x3

img1 ='1.jpg').convert('RGB')
               # left, top, right, bottom
img1 = F.pad(img1, (0, 0, 1, 1))
img1 = np.array(img1)
print(img1.shape) # 481x851x3
# Question1: Is this image padded for RGB channels respectively?

size = 128 # patch size
stride = 128 # patch stride

img2 = torch.from_numpy(img1)
patches = img2.unfold(0, size, stride).unfold(1, size, stride)
print(patches.shape) # torch.Size([3, 6, 3, 128, 128])
# Question2: what does [3, 6, 3] mean?

Thank you in advance!!!

  • Question1: yes, the padding performed for each channels of RGB image.
  • Question2: the shape of img2 is 484x851x3, when you unfold on 0-dim, the shape of intermediate result is 3x851x3x128, it means that on the 0-dim, it can unfold 484 into 3 pieces with patch size of 128 and stride size of 128. So the final result 3x6x3x128x128 means that on the 1-dim it spilt 851 into 6 pieces.

You can fix the final shape into [batch_size, channels, H, W] or what you want.