Given an image, I need to extract patches with stride of 32, after changing the smaller dimension of the image to 227.
How can I do that?
Given an image, I need to extract patches with stride of 32, after changing the smaller dimension of the image to 227.
How can I do that?
You could use unfold
as described in this post.
I get this error:
RuntimeError: maximum size for tensor at dimension 3 is 3 but size is 32
when I have this code :
S = 128 # channel dim
W = 227 # width
H = 227 # height
batch_size = 10
x = image_new.unsqueeze(0)
size = 32 # patch size
stride = 32 # patch stride
patches = x.unfold(1, size, stride).unfold(2, size, stride).unfold(3, size, stride)
print(patches.shape)
The following code works for me:
S = 128 # channel dim
W = 227 # width
H = 227 # height
batch_size = 10
x = torch.randn(batch_size, S, H, W)
size = 32 # patch size
stride = 32 # patch stride
patches = x.unfold(1, size, stride).unfold(2, size, stride).unfold(3, size, stride)
print(patches.shape)
What shape does x
have after the unsqueeze
operation?
I first loaded the image like so:
import numpy as np
import cv2
img = cv2.imread('cat.jpg',1)
image_new=torch.from_numpy(img)
and then I used the code above
and the size of the x is:
torch.Size([1, 576, 1024, 3])
dim3
has only a size of 3, so you cannot unfold it with a kernel size of 32.
I guess you would like to unfold only in dim1 and dim2?
Unrelated to this, but note that PyTorch expects image tensors in the shape [batch_size, channels, height, width]
.
So the method of loading the image using opencv and then convert it to tensor is wrong? Is there another way to convert the image to tensor so that it outputs a shape with the dimensions you specified?
also when I only do dim1 and dim2 I get a size of this :
torch.Size([1, 18, 32, 3, 32, 32])
and when I show the image using plt, there’s still something wrong
You could use PIL
to load the image and then torch.from_numpy
to create the tensor or alternatively use OpenCV, transform the image from BGR to RGB, and permute the dimensions via image = image.permute(2, 0, 1)
.
The first solution gave me the same dimensions as before, also I had to use a different code like so :
image=torch.as_tensor(np.array(image).astype('float'))
when I use the code you posted like so :
from PIL import Image
# open method used to open different extension image file
im = Image.open(r"cat.jpg")
image_new = torch.from_numpy(im)
it gives me an error :
TypeError: expected np.ndarray (got JpegImageFile)
So the first solution doesn’t work.
The second solution I got this error when I did permute
AttributeError: 'numpy.ndarray' object has no attribute 'permute'
The code was like so :
image = cv2.imread('cat.jpg',1)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) #transfrom image from BGR to RGB
image = image.permute(2, 0, 1)
I want to make patches with size 32x32 with stride=16(overlap). How to reshape/reconstruct this patches to the original image?
Sorry for the confusion.
This should work:
img = PIL.Image.open(path)
x = torch.from_numpy(np.array(img))
x = x.permute(2, 0, 1)
# or
y = transforms.ToTensor()(img) # will permute and normalize to [0, 1]
nn.Unfold
and nn.Fold
will give you the ability to recreate the input, but note that the overlapping pixels will be summed.
Now it’s giving me this error:
RuntimeError: maximum size for tensor at dimension 1 is 3 but size is 128
and the shape of x is :
torch.Size([1, 3, 576, 1024])
and my code is :
from PIL import Image
img = Image.open("cat.jpg")
x = torch.from_numpy(np.array(img))
x = x.permute(2, 0, 1)
x = x.unsqueeze(0)
size = 128 # patch size
stride = 32 # patch stride
patches = x.unfold(1, size, stride).unfold(2, size, stride).unfold(3, size, stride)
print(patches.shape)
why do you have 128 channels anyways? aren’t they 3? red, green and blue?
I used your code snippet from this post. If you are not dealing with 128 channels, then you should change it.
Regarding the error message: you cannot use a kernel size of 128 for 3 channels.
You have a small api to extract/combine patches in Kornia
https://kornia.readthedocs.io/en/latest/contrib.html#image-patches
The exact function that you are looking for is extract_tensor_patches