How to extract patches from an image

Given an image, I need to extract patches with stride of 32, after changing the smaller dimension of the image to 227.

How can I do that?

You could use unfold as described in this post.

I get this error:

RuntimeError: maximum size for tensor at dimension 3 is 3 but size is 32

when I have this code :

S = 128 # channel dim
W = 227 # width
H = 227 # height
batch_size = 10

x = image_new.unsqueeze(0)

size = 32 # patch size
stride = 32 # patch stride
patches = x.unfold(1, size, stride).unfold(2, size, stride).unfold(3, size, stride)
print(patches.shape)

The following code works for me:

S = 128 # channel dim
W = 227 # width
H = 227 # height
batch_size = 10

x = torch.randn(batch_size, S, H, W)

size = 32 # patch size
stride = 32 # patch stride
patches = x.unfold(1, size, stride).unfold(2, size, stride).unfold(3, size, stride)
print(patches.shape)

What shape does x have after the unsqueeze operation?

I first loaded the image like so:

import numpy as np
import cv2

img = cv2.imread('cat.jpg',1)
image_new=torch.from_numpy(img)

and then I used the code above

and the size of the x is:

torch.Size([1, 576, 1024, 3])

dim3 has only a size of 3, so you cannot unfold it with a kernel size of 32.
I guess you would like to unfold only in dim1 and dim2?

Unrelated to this, but note that PyTorch expects image tensors in the shape [batch_size, channels, height, width].

So the method of loading the image using opencv and then convert it to tensor is wrong? Is there another way to convert the image to tensor so that it outputs a shape with the dimensions you specified?

also when I only do dim1 and dim2 I get a size of this :

torch.Size([1, 18, 32, 3, 32, 32])

and when I show the image using plt, there’s still something wrong

You could use PIL to load the image and then torch.from_numpy to create the tensor or alternatively use OpenCV, transform the image from BGR to RGB, and permute the dimensions via image = image.permute(2, 0, 1).

The first solution gave me the same dimensions as before, also I had to use a different code like so :

image=torch.as_tensor(np.array(image).astype('float'))

when I use the code you posted like so :

from PIL import Image 
  
# open method used to open different extension image file 
im = Image.open(r"cat.jpg")  
image_new = torch.from_numpy(im)

it gives me an error :

TypeError: expected np.ndarray (got JpegImageFile)

So the first solution doesn’t work.

The second solution I got this error when I did permute

AttributeError: 'numpy.ndarray' object has no attribute 'permute'

The code was like so :

image = cv2.imread('cat.jpg',1)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) #transfrom image from BGR to RGB
image = image.permute(2, 0, 1)

I want to make patches with size 32x32 with stride=16(overlap). How to reshape/reconstruct this patches to the original image?

Sorry for the confusion.
This should work:

img = PIL.Image.open(path)
x = torch.from_numpy(np.array(img))
x = x.permute(2, 0, 1)

# or
y = transforms.ToTensor()(img) # will permute and normalize to [0, 1]

nn.Unfold and nn.Fold will give you the ability to recreate the input, but note that the overlapping pixels will be summed.

Now it’s giving me this error:

RuntimeError: maximum size for tensor at dimension 1 is 3 but size is 128

and the shape of x is :

torch.Size([1, 3, 576, 1024])

and my code is :

from PIL import Image 

img = Image.open("cat.jpg")
x = torch.from_numpy(np.array(img))
x = x.permute(2, 0, 1)

x = x.unsqueeze(0)

size = 128 # patch size
stride = 32 # patch stride
patches = x.unfold(1, size, stride).unfold(2, size, stride).unfold(3, size, stride)
print(patches.shape)

why do you have 128 channels anyways? aren’t they 3? red, green and blue?

I used your code snippet from this post. If you are not dealing with 128 channels, then you should change it. :wink:

Regarding the error message: you cannot use a kernel size of 128 for 3 channels.