How to extract smaller image patches (3D)?

ptrblck · May 10, 2020, 8:47am

If you can use nn.Fold, then you should stick to it. Otherwise, you might need to write a custom C++ extension or try to add a Python loop over multiple nn.Fold calls, if that’s possible.

Surbhi_Khushu · May 10, 2020, 9:16am

Hi ,

Can you tell me if I want to save these patches into my local drive before writing a Dataset loader. in which format and how should I save them.

ptrblck · May 10, 2020, 9:23am

You could directly store the tensors using torch.save(patches, 'patches.pt') and later load them as patches = torch.load('patches.pt').
After loading them, you could use a TensorDataset to create a dataset with these tensors (and targets).

Surbhi_Khushu · May 11, 2020, 2:20pm

Thank you fr your response.
For my case, I have now two different modality data saved as CT_Patch.pt and PET_Patch tensors(Did the patch extraction and saved as .pt). Since I want to write a dataloader , for input to my cycleGan, where we have data from domain PET and domain CT for training
So will I have to write two separate dataloader for the same?

ptrblck · May 12, 2020, 3:09am

It depends a bit how you would like to feed these data into your model(s).
If you want to create a batch of PET images only and another of CT images, you could use two DataLoaders. On the other hand, if you want to mix the modalities, you could use a single DataLoader.
Could you explain your use case a bit, i.e. how are the modalities used and in which steps?

Katou2 · May 13, 2020, 3:31am

Thank you very much for your suggestion. I’ve tried to fold the patches back through a loop of nn.Fold. It nearly works, however, there still exist some problems which I cannot figure out. Would you please take a look and find the problem?

def show_tensor(tensor):
#input (C x H x w)
img_array = np.array(tensor, dtype=np.uint8)
img_array = img_array.transpose(1, 2, 0)
cv2.imshow(‘img’, img_array)
cv2.waitKey(0)
cv2.destroyAllWindows()

def fold_3d_official():
kernel_size = (3,3,3)
stride = (3,3,3)
padding = (1,1,1)
dilation = (1,1,1)
img = cv2.imread(’/home/katou2/Pictures/your_name_resize.png’)
img = np.array(img, dtype=np.float32)
img = cv2.resize(img, (112, 112))

img = torch.from_numpy(img)

img_batch = []
for i in range(16):
    img_batch.append(img)

img_batch_tensor = torch.stack(img_batch)

img_batch = img_batch_tensor.permute(3, 0, 1, 2)
img_batch = img_batch.unsqueeze(0)
# show_tensor(img_batch[0, :, 0, :, :])

x = F.pad(img_batch, (padding[2], padding[2], padding[1], padding[1], padding[0], padding[0]))
x = x.unfold(2, kernel_size[0], stride[0]).unfold(3, kernel_size[1], stride[1]).unfold(4, kernel_size[2], stride[2])
x = x.permute(0, 1, 5, 6, 7, 2, 3, 4)
x = x.contiguous().view(1, 3*3, -1)
fold_1 = nn.Fold((16, 114*114), kernel_size=(3,1), dilation=(1,1), padding=(1,0), stride=(3,1))
y = fold_1(x)
y = y.contiguous().view(1, 3*16*3*3, -1)
fold_2 = nn.Fold((112, 112), kernel_size=(3,3), dilation=(1,1), padding=(1,1), stride=(3,3))
z = fold_2(y)
z = z.contiguous().view(1, 3, 16, 112, 112)
show_tensor(z[0, :, 1, :, :])

ptrblck · May 13, 2020, 3:32am

Do you get an error message (and could post it here) or what is not working at the moment?

Katou2 · May 13, 2020, 4:13am

The code is to rebuild an image from patches extracted by your method via nn.Fold. This code can run without an error message and an image can be rebuilt normally. However, the rebuilt image is a little different from the orginal image which I cannot figure out why.

ptrblck · May 13, 2020, 4:21am

Could you post the shape of the input tensor (after the permutation etc.) so that we could have a look?

Katou2 · May 13, 2020, 7:33am

If you have an interest, you could run the following code with an input image path and you will know the difference between the built image and the original image. Thanks.

import cv2
import torch
import torch.nn as nn
import numpy as np
import torch.nn.functional as F


def show_tensor(tensor):
    #input (C x H x w)
    img_array = np.array(tensor, dtype=np.uint8)
    img_array = img_array.transpose(1, 2, 0)
    cv2.imshow('img', img_array)
    cv2.waitKey(0)
    cv2.destroyAllWindows()


def fold_3d_official(img_path):
    kernel_size = (3,3,3)
    stride = (3,3,3)
    padding = (1,1,1)
    dilation = (1,1,1)
    img = cv2.imread(img_path)
    img = np.array(img, dtype=np.float32)
    img = cv2.resize(img, (112, 112))
    
    img = torch.from_numpy(img)
    
    img_batch = []
    for i in range(16):
        img_batch.append(img)
    
    img_batch_tensor = torch.stack(img_batch)
    
    img_batch = img_batch_tensor.permute(3, 0, 1, 2) # torch.Size([3, 16, 112, 112])
    img_batch = img_batch.unsqueeze(0) # torch.Size([1, 3, 16, 112, 112])
    
    x = F.pad(img_batch, (padding[2], padding[2], padding[1], padding[1], padding[0], padding[0]))
    x = x.unfold(2, kernel_size[0], stride[0]).unfold(3, kernel_size[1], stride[1]).unfold(4, kernel_size[2], stride[2])
    x = x.permute(0, 1, 5, 6, 7, 2, 3, 4) # torch.Size([1, 3, 3, 3, 3, 6, 38, 38])
    
    x = x.contiguous().view(1, 3*3, -1) # torch.Size([1, 9, 77976])
    fold_1 = nn.Fold((16, 114*114), kernel_size=(3,1), dilation=(1,1), padding=(1,0), stride=(3,1))
    y = fold_1(x)
    y = y.contiguous().view(1, 3*16*3*3, -1) # torch.Size([1, 432, 1444])
    fold_2 = nn.Fold((112, 112), kernel_size=(3,3), dilation=(1,1), padding=(1,1), stride=(3,3))
    z = fold_2(y)
    z = z.contiguous().view(1, 3, 16, 112, 112) # torch.Size([1, 3, 16, 112, 112])
    show_tensor(z[0, :, 1, :, :])


if __name__ == "__main__":
    img_path = '/home/katou2/Pictures/your_name_resize.png'
    fold_3d_official(img_path)

Katou2 · May 13, 2020, 11:16am

I made it finally. Just changing the order of x = x.permute(0, 1, 5, 6, 7, 2, 3, 4) -> x = x.permute(0, 1, 5, 2, 6, 7, 3, 4); Someone wants to fold the 3D patches back could use the following code.

import cv2
import torch
import torch.nn as nn
import numpy as np
import torch.nn.functional as F


def show_tensor(tensor):
    #input (C x H x w)
    img_array = np.array(tensor, dtype=np.uint8)
    img_array = img_array.transpose(1, 2, 0)
    cv2.imshow('img', img_array)
    cv2.waitKey(0)
    cv2.destroyAllWindows()


def fold_3d_official(img_path):
    kernel_size = (3,3,3)
    stride = (3,3,3)
    padding = (1,1,1)
    dilation = (1,1,1)
    img = cv2.imread(img_path)
    img = np.array(img, dtype=np.float32)
    img = cv2.resize(img, (112, 112))
    
    img = torch.from_numpy(img)

    img_batch = []
    for i in range(16):
        img_batch.append(img)
    
    img_batch_tensor = torch.stack(img_batch)
    
    img_batch = img_batch_tensor.permute(3, 0, 1, 2) # torch.Size([3, 16, 112, 112])
    img_batch = img_batch.unsqueeze(0) # torch.Size([1, 3, 16, 112, 112])
    
    x = F.pad(img_batch, (padding[2], padding[2], padding[1], padding[1], padding[0], padding[0]))
    x = x.unfold(2, kernel_size[0], stride[0]).unfold(3, kernel_size[1], stride[1]).unfold(4, kernel_size[2], stride[2])
    x = x.permute(0, 1, 5, 2, 6, 7, 3, 4) # torch.Size([1, 3, 3, 3, 3, 6, 38, 38])
    
    x = x.contiguous().view(1, 3*3, -1) # torch.Size([1, 9, 77976])
    fold_1 = nn.Fold((16, 114*114), kernel_size=(3,1), dilation=(1,1), padding=(1,0), stride=(3,1))
    y = fold_1(x)
    # y = y.contiguous().view(1, 3, 16, 114, 114)
    # show_tensor(y[0, :, 1, :, :])

    y = y.contiguous().view(1, 3*16*3*3, -1) # torch.Size([1, 432, 1444])
    fold_2 = nn.Fold((112, 112), kernel_size=(3,3), dilation=(1,1), padding=(1,1), stride=(3,3))
    z = fold_2(y)
    z = z.contiguous().view(1, 3, 16, 112, 112) # torch.Size([1, 3, 16, 112, 112])
    show_tensor(z[0, :, 1, :, :])


if __name__ == "__main__":
    img_path = '/home/katou2/Pictures/your_name_resize.png'
    fold_3d_official(img_path)

@ptrblck I appreciate what you have done indeed. Thank you very much. It indeed helps a lot.

CS.Enthu · June 19, 2020, 2:38pm

Hello, with respect to this particular example,for understanding purpose

If suppose I have an image of size [284,143,143] converted to numpy, with dtype =float32 ndim = 3.
And I want to extract 2D patches out of this. How will the padding for sliding window and symmetric padding change here.

I mean the padding would be likewise in the example and then for unfold we would just do
ret = x.unfold(0, kernel_size, stride).unfold(1, kernel_size, stride) and not take size[2] here???
If thats the case then it would give as torch.Size([5656, 64, 64]) 2D patches .Is it correct?

If we do not pad for the size[2] then ret = x.unfold(0, kernel_size, stride).unfold(1, kernel_size, stride) , it will give torch.Size([3030, 64, 64])
Can you explain how this padding affects here and how can I acheive 64 X 64 2D patches from [284,143,143], in this exact scenario

ptrblck · June 20, 2020, 7:30am

I’m not sure why the padding should change if you are using numpy. Could you explain this question a bit, please?

It depends, what the dimensions represent in your example.
If I remember the original question cirrectly, all 3 dimensions created a volume, so all had to be padded.
Usually you would pad the spatial dimensions for 2D patches.

Could you explain, what the dimensions stand for? I assume dim0 would be the channel dimension, since you would like to create 2D patches?
If so, you could reuse my code for dim1 and dim2 padding.

CS.Enthu · June 25, 2020, 9:08pm

Hello ,
Sorry for creating confusion. My question is ,
If I have a medical image stored as numpy array of size [284,143,143] where 284 are the number of slices , H, W. dtype = float32 and dim =3.
If I want to extract the 2D patch of size 64* 64 from the image. How will I acheive it using unfold.

’ def extract_patches(img, kernel_size = 64, stride=46):

pad1_left = (img.size(1) // stride * stride + kernel_size) - img.size(1)
pad2_left = (img.size(2) // stride * stride + kernel_size) - img.size(2)

# Calculate symmetric padding

pad1_right = pad1_left // 2 if pad1_left % 2 == 0 else pad1_left // 2 + 1
pad2_right = pad2_left // 2 if pad2_left % 2 == 0 else pad2_left // 2 + 1


pad1_left = pad1_left // 2
pad2_left = pad2_left // 2
x = F.pad(img, (pad2_left, pad2_right, pad1_left, pad1_right))


ret = x.unfold(1, kernel_size, stride).unfold(2, kernel_size, stride).reshape(-1,64,64)

ret = ret.unsqueeze(1)  #add a channel dimension.'

My doubt is will I be performing unfolding on dim 0 as well?

ptrblck · June 26, 2020, 4:05am

It depends what each patch should contain.
If each 64x64 patch should contain all slices, then you should not unfold dim0.
On the other hand, if each patch should only contain a specific number of slices, you could also unfold dim0. If the patches should contain a single slice, it would probably be easier to just split the output.

CS.Enthu · July 3, 2020, 10:02am

Hello,

Can you explain what do you mean by splitting the output

ptrblck · July 4, 2020, 5:29am

To split dim0 into separate tensors, each with a size of 1, you could use x.split(1, dim=0)
This could be applied after unfolding dim1 and dim2, but it depends on your use case and what the result shape should be.

Eva · July 8, 2020, 1:45pm

Hi @ptrblck , I have a similar problem: I also have H x W x D sized 3D images (no batch or channel dimension). And I wish to extract 3x3x3 patches, but overlapping.
I would need to first pad the image everywhere with infinity (because of the subsequent computations), and then do the unfolding, to make sure the number of patches in the end is the same as number of voxels in my original image
.
But I would need the output to be of the size 27 x H x W x D, so that output[:, i, j k] is the flattened 3x3x3 patch centered in the original image around [i,j,k].

Can I do this directly with unfold as well? I suppose I could use reshape on top of it, but I have no clue on how to make sure that the values really end up on the right places after reshaping…

I also checked the view method (I assume that would be faster, considering it’s just another view of the same tensor?) but as far as I understood, view only works for non-overlapping patches… Or did I misunderstand?

ptrblck · July 9, 2020, 1:21am

I’m not sure I understand the desired output shape correctly.
unfold would create a specific number of patches, where each patch would have the specified kernel shape.
It seems your output should have some H, W, D dimensions, which might not be the kernel shape?
If so, how would these shapes be calculated?

I assume you would like to somehow reshape the dimension containing the patches, so that you could index neighboring patches using i, j, k?

Eva · July 9, 2020, 9:18am

So, if the original image IM size is H x W x D, I pad it all around with 1 pixel of Inf, to get something, let’s call it IM2, sized (H+2) x (W+2) x (D+2).
Then I would like to extract all the 3x3x3 patches from there, with centers on the pixels of original image. (so first patch would extend IM2[0:3, 0:3, 0:3], and it’s center corresponds to the first pixel in IM, IM[0,0,0]).

If I do three unfolds on IM2 one after another, as you have suggested in some answers above, I get something sized H x W x D x 3 x 3 x 3. What I wish is to have the patch dimensions flattened instead, to get H x W x D x 27.

So what I wonder is if the three unfolds + reshape would really give me exactly this, with values at the right places? That is, if the patch centered at i,j,k is [[1,2,3],[4,5,6],[7,8,9]], then in the final H x W x D x 27 array the elements [i,j,k,:] should be [1,2,3,4,5,6,7,8,9].
In addition - is it possible to do this entire thing, or at least some parts, by using .view instead? Because I am working with very large 3D arrays, so reshaping and in any way copying the data should probably be avoided… But I just don’t see how to use view in this sense (to have overlapping patches). And also, if I just use .view(h,w,d,27) on the unfolded array of size HxWxDx3x3x3, it complains that the thing is not contiguous… So I am a bit lost at how to do it efficiently.