I try to implement NetVLAD neural network. Before implementing it, i have to do data preprocessing, which is extract 128x128 and 68x68 patch size of image dataset. In order to prevent pixel loss, i have to do image padding first. I looked at Pytorch API and i figured it out.
- Using F.pad for padding
- Using unfold API to extract patches
- write custom dataloader
I know the logic but i dont know how to implement it. Can anyone help me out?
I think you could first preprocess your dataset, extract patches and perform padding by torchvision.transforms.functional.pad
(link) and save the extracted and padded data to the disk. Then you could write a custom dataset
and dataloader
to load your processed data.
import torch
import torchvision.transforms.functional as F
from PIL import Image
import numpy as np
import cv2
img = cv2.imread('1.jpg')
print(img.shape) # 483x850x3
img1 = Image.open('1.jpg').convert('RGB')
# left, top, right, bottom
img1 = F.pad(img1, (0, 0, 1, 1))
img1 = np.array(img1)
print(img1.shape) # 481x851x3
# Question1: Is this image padded for RGB channels respectively?
size = 128 # patch size
stride = 128 # patch stride
img2 = torch.from_numpy(img1)
patches = img2.unfold(0, size, stride).unfold(1, size, stride)
print(patches.shape) # torch.Size([3, 6, 3, 128, 128])
# Question2: what does [3, 6, 3] mean?
Thank you in advance!!!
- Question1: yes, the padding performed for each channels of RGB image.
- Question2: the shape of
img2
is 484x851x3, when you unfold on 0-dim, the shape of intermediate result is 3x851x3x128
, it means that on the 0-dim, it can unfold 484 into 3 pieces with patch size of 128 and stride size of 128. So the final result 3x6x3x128x128
means that on the 1-dim it spilt 851 into 6 pieces.
You can fix the final shape into [batch_size, channels, H, W] or what you want.