How to extract patches from an image

Given an image, I need to extract patches with stride of 32, after changing the smaller dimension of the image to 227.

How can I do that?

You could use unfold as described in this post.

1 Like

I get this error:

RuntimeError: maximum size for tensor at dimension 3 is 3 but size is 32

when I have this code :

S = 128 # channel dim
W = 227 # width
H = 227 # height
batch_size = 10

x = image_new.unsqueeze(0)

size = 32 # patch size
stride = 32 # patch stride
patches = x.unfold(1, size, stride).unfold(2, size, stride).unfold(3, size, stride)

The following code works for me:

S = 128 # channel dim
W = 227 # width
H = 227 # height
batch_size = 10

x = torch.randn(batch_size, S, H, W)

size = 32 # patch size
stride = 32 # patch stride
patches = x.unfold(1, size, stride).unfold(2, size, stride).unfold(3, size, stride)

What shape does x have after the unsqueeze operation?

I first loaded the image like so:

import numpy as np
import cv2

img = cv2.imread('cat.jpg',1)

and then I used the code above

and the size of the x is:

torch.Size([1, 576, 1024, 3])

dim3 has only a size of 3, so you cannot unfold it with a kernel size of 32.
I guess you would like to unfold only in dim1 and dim2?

Unrelated to this, but note that PyTorch expects image tensors in the shape [batch_size, channels, height, width].

1 Like

So the method of loading the image using opencv and then convert it to tensor is wrong? Is there another way to convert the image to tensor so that it outputs a shape with the dimensions you specified?

also when I only do dim1 and dim2 I get a size of this :

torch.Size([1, 18, 32, 3, 32, 32])

and when I show the image using plt, there’s still something wrong

You could use PIL to load the image and then torch.from_numpy to create the tensor or alternatively use OpenCV, transform the image from BGR to RGB, and permute the dimensions via image = image.permute(2, 0, 1).

The first solution gave me the same dimensions as before, also I had to use a different code like so :


when I use the code you posted like so :

from PIL import Image 
# open method used to open different extension image file 
im ="cat.jpg")  
image_new = torch.from_numpy(im)

it gives me an error :

TypeError: expected np.ndarray (got JpegImageFile)

So the first solution doesn’t work.

The second solution I got this error when I did permute

AttributeError: 'numpy.ndarray' object has no attribute 'permute'

The code was like so :

image = cv2.imread('cat.jpg',1)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) #transfrom image from BGR to RGB
image = image.permute(2, 0, 1)

I want to make patches with size 32x32 with stride=16(overlap). How to reshape/reconstruct this patches to the original image?

Sorry for the confusion.
This should work:

img =
x = torch.from_numpy(np.array(img))
x = x.permute(2, 0, 1)

# or
y = transforms.ToTensor()(img) # will permute and normalize to [0, 1]

nn.Unfold and nn.Fold will give you the ability to recreate the input, but note that the overlapping pixels will be summed.

Now it’s giving me this error:

RuntimeError: maximum size for tensor at dimension 1 is 3 but size is 128

and the shape of x is :

torch.Size([1, 3, 576, 1024])

and my code is :

from PIL import Image 

img ="cat.jpg")
x = torch.from_numpy(np.array(img))
x = x.permute(2, 0, 1)

x = x.unsqueeze(0)

size = 128 # patch size
stride = 32 # patch stride
patches = x.unfold(1, size, stride).unfold(2, size, stride).unfold(3, size, stride)

why do you have 128 channels anyways? aren’t they 3? red, green and blue?

I used your code snippet from this post. If you are not dealing with 128 channels, then you should change it. :wink:

Regarding the error message: you cannot use a kernel size of 128 for 3 channels.

You have a small api to extract/combine patches in Kornia

The exact function that you are looking for is extract_tensor_patches