# How to create same size of patches from different size of image dataset?

I try to implement NetVLAD neural network. Before implementing it, i have to do data preprocessing, which is extract 128x128 and 68x68 patch size of image dataset. In order to prevent pixel loss, i have to do image padding first. I looked at Pytorch API and i figured it out.

2. Using unfold API to extract patches

I know the logic but i dont know how to implement it. Can anyone help me out?

I think you could first preprocess your dataset, extract patches and perform padding by `torchvision.transforms.functional.pad` (link) and save the extracted and padded data to the disk. Then you could write a custom `dataset` and `dataloader` to load your processed data.

``````import torch
import torchvision.transforms.functional as F
from PIL import Image
import numpy as np
import cv2

print(img.shape) # 483x850x3

img1 = Image.open('1.jpg').convert('RGB')
# left, top, right, bottom
img1 = F.pad(img1, (0, 0, 1, 1))
img1 = np.array(img1)
print(img1.shape) # 481x851x3
# Question1: Is this image padded for RGB channels respectively?

size = 128 # patch size
stride = 128 # patch stride

img2 = torch.from_numpy(img1)
patches = img2.unfold(0, size, stride).unfold(1, size, stride)
print(patches.shape) # torch.Size([3, 6, 3, 128, 128])
# Question2: what does [3, 6, 3] mean?

``````

• Question2: the shape of `img2` is 484x851x3, when you unfold on 0-dim, the shape of intermediate result is `3x851x3x128`, it means that on the 0-dim, it can unfold 484 into 3 pieces with patch size of 128 and stride size of 128. So the final result `3x6x3x128x128` means that on the 1-dim it spilt 851 into 6 pieces.