Can we divide up input images into patches?

Thanks for the help!

Is there a way to divide up our input image into, let us say, 16x16 pixel patches via a custom ImageFolder? Ideally, the image would be divided into non-overlapping patches, and each patch could be used as an individual data point to train the model.

I have read other people using two different transforms on the same dataset, but this does not divide up the entire image.

I have also read up on doing this with the unfold function, but I was not able to get it to work

I would expect to get 6*9 = 54 patches here. So, I would expect the dimensions to get (54, 16, 16), but instead I get the dimensions to be torch.Size([103, 9, 8, 1, 16, 16])

x = np.load("/Users/maycaj/Documents/Hyperspectral-Imaging_II/Sunscreen Input/Train/Lower Leg/CT30_1Sunscreen.npy")
print(x.shape)
print(f"Num in width: {x.shape[0]//16} Num in height: {x.shape[1]//16}")
x_tensor = torch.from_numpy(x)
unfold = x_tensor.unfold(1, 16, 16).unfold(2, 16, 16).unfold(3, 16, 16)
print(unfold.shape)
(103, 152, 128)
Num in width: 6 Num in height: 9
torch.Size([103, 9, 8, 1, 16, 16])

Using unfold sounds like the right approach. In your example you are unfolding on 3 dimensions while I would expect you want to unfold the spatial dimensions only?

Thanks for your response!
Yes, exactly. So right now I have images that are something like (H, W, C) (103, 152, 128). I would like to divide them up so that they are divided in the spatial dimension. So I should get (56, 16, 16, 128) as my dimensions (batch, height, width, channels).

This code should work:

import torch

x = torch.arange(103*152*128).view(103, 152, 128)
print(x)

out = x.unfold(0, 16, 16).unfold(1, 16, 16)
print(out.shape)
# torch.Size([6, 9, 128, 16, 16])
out = out.permute(0, 1, 3, 4, 2).contiguous().view(-1, 16, 16, 128)
print(out.shape)
# torch.Size([54, 16, 16, 128])
print(out)

and also this post might be helpful, too.

Thanks for the response, that worked perfectly!

Is there a way to incorporate this into my dataset loader?
My goal is to have each .npy image that is loaded split into the 16x16 (h x w) patches.

# Write a custom dataset class (inherits from torch.utils.data.Dataset)
from torch.utils.data import Dataset

# 1. Subclass torch.utils.data.Dataset
class ImageFolderCustom(Dataset):
    
    # 2. Initialize with a targ_dir and transform (optional) parameter
    def __init__(self, targ_dir: str, transform=None) -> None:
        
        # 3. Create class attributes
        # Get all image paths
        self.paths = list(pathlib.Path(targ_dir).glob("*/*.npy")) # modified from '*/*.jpg'
        # Setup transforms
        self.transform = transform
        # Create classes and class_to_idx attributes
        self.classes, self.class_to_idx = find_classes(targ_dir)

    # 4. Make function to load images
    def load_image(self, index: int) -> Image.Image:
        "Opens an image via a path and returns it."
        image_path = self.paths[index]
        image = np.load(image_path) # modified from opening a PNG file
        # print(image.shape)
        #change from [696, 520, 128] [width, height, channels] to [channels, height, width]
        image = np.transpose(image, (2, 1, 0))
        image = torch.from_numpy(image)
        return image 
    
    # 5. Overwrite the __len__() method (optional but recommended for subclasses of torch.utils.data.Dataset)
    def __len__(self) -> int:
        "Returns the total number of samples."
        return len(self.paths)
    
    # 6. Overwrite the __getitem__() method (required for subclasses of torch.utils.data.Dataset)
    def __getitem__(self, index: int) -> Tuple[torch.Tensor, int]:
        "Returns one sample of data, data and label (X, y)."
        img = self.load_image(index)
        
        class_name  = self.paths[index].parent.name # expects path in data_folder/class_name/image.jpeg
        class_idx = self.class_to_idx[class_name]
                # Transform if necessary
        if self.transform:
            return self.transform(img), class_idx # return data, label (X, y)
        else:
            return img, class_idx # return data, label (X, y)

Yes, unfold the image tensor in the __getitem__ and return all patches. Inside the DataLoader loop you could then flatten the patches into the batch dimension, and thus increase the batch size, or you could iterate them in a separate loop.

I modified the custom dataset class so that the new images from each patch end up in the batch.

    def __getitem__(self, index: int) -> Tuple[torch.Tensor, int]:
        "Returns one sample of data, data and label (X, y)."
        img = self.load_image(index)
        print(f"size 1: {img.size()}")
        # added to divide up image into 16x16 patches
        img = img.unfold(1, 16, 16).unfold(2, 16, 16)
        print(f"size 2: {img.size()}")
        # torch.Size([6, 9, 128, 16, 16])
        # View sets the new dimensions, and -1 makes an inference on that dimension
        img = img.permute(1,2,0,3,4).contiguous()
        print(f"size 3: {img.size()}")
        img = img.view(-1, 128, 16, 16) 
        print(f"size 4: {img.size()}")
        # end of added part to divide image

        class_name  = self.paths[index].parent.name # expects path in data_folder/class_name/image.jpeg
        class_idx = self.class_to_idx[class_name]
                # Transform if necessary
        if self.transform:
            return self.transform(img), class_idx # return data, label (X, y)
        else:
            return img, class_idx # return data, label (X, y)

size 1: torch.Size([128, 87, 90])
size 2: torch.Size([128, 5, 5, 16, 16])
size 3: torch.Size([5, 5, 128, 16, 16])
size 4: torch.Size([25, 128, 16, 16])

size 1: torch.Size([128, 93, 96])
size 2: torch.Size([128, 5, 6, 16, 16])
size 3: torch.Size([5, 6, 128, 16, 16])
size 4: torch.Size([30, 128, 16, 16])

When I try to run the model, there is an issue with running stack. Stack expects each tensor to be the same size. From the first two images in my output you can see the batch size is 25 then 30, hence the confusion. How can I combine these without running into the error?
Perhaps a custom collate function?
Thanks!

Yes, a custom collate_fn should work allowing you to concatenate the patches.

For the sake of getting a single pass working on my model first, I set the batch size = 1 to bypass the need for a collate_fn.

After that, I originally got this error:

{
	"name": "RuntimeError",
	"message": "Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [1, 28, 128, 16, 16]",
	"stack": "---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[22], line 17
     15 model_0.eval()
     16 with torch.inference_mode():
---> 17     pred = model_0(img_single.to(device))
     19 # 4. Print out what's happening and convert model logits -> pred probs -> pred label
     20 print(f\"Output logits:\
{pred}\
\")

File ~/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File ~/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

Cell In[21], line 45, in TinyVGG.forward(self, x)
     44 def forward(self, x: torch.Tensor):
---> 45     x = self.conv_block_1(x)
     46     # print(x.shape)
     47     x = self.conv_block_2(x)

File ~/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File ~/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File ~/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/container.py:215, in Sequential.forward(self, input)
    213 def forward(self, input):
    214     for module in self:
--> 215         input = module(input)
    216     return input

File ~/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File ~/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File ~/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/conv.py:460, in Conv2d.forward(self, input)
    459 def forward(self, input: Tensor) -> Tensor:
--> 460     return self._conv_forward(input, self.weight, self.bias)

File ~/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/conv.py:456, in Conv2d._conv_forward(self, input, weight, bias)
    452 if self.padding_mode != 'zeros':
    453     return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
    454                     weight, bias, self.stride,
    455                     _pair(0), self.dilation, self.groups)
--> 456 return F.conv2d(input, weight, bias, self.stride,
    457                 self.padding, self.dilation, self.groups)

RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [1, 28, 128, 16, 16]"
}

Then, I went back to trying to get the collate_fn to work with a batch size of 16. This is what collate_fn looks like for me:

def custom_collate(data):
    imgs = []
    labels = []
    for batch in data:
        # one label per batch
        label = batch[1]
        # multiple imgs per batch, need to iterate over
        for img in batch[0]:
            # img = torch.tensor(img)
            imgs.append(img)
            # labels = torch.tensor(label)
            labels.append(label)
    imgs = torch.stack(imgs)
    labels = torch.tensor(labels)
    return imgs, labels

The issue that I am running into is that it is still returning batch sizes that are inconsistent. For example, here are the first 3:
[36, 128, 16, 16]
[475, 128, 16, 16]
[56, 128, 16, 16]
Then, when I go to train there are errors with inconsistent batch size. Is there a way to tweak what I have so that the batch size is determined in the DataLoader setup? So with the code below I would like that to be 16, for example:

train_dataloader_custom = DataLoader(dataset=train_data_custom, # use custom created train Dataset
                                     batch_size=16, # how many samples per batch?
                                     num_workers=0, # how many subprocesses to use for data loading? (higher = more)
                                                    # subprocesses are multiple processes, all loading data simultaneously
                                     shuffle=True,  # shuffle the data?
                                     collate_fn = custom_collate
                                    )

Thanks so much for your help as always, I hope what I am saying makes sense :slight_smile:

Thanks

Narrow down which part of the code changes the batch size as it’s unexpected unless the input resolution of the images differs. If so, resize them to a constant size.

I was able to get past the issue with the inconsistent batch size.

So I am having an issue with not being able to train the model. The issue is that most of my gradients are none except for one:

Parameter name: conv_block_1.0.bias:
Parameter data: tensor([-0.0284,  0.0071, -0.0305, -0.0439,  0.0602,  0.0827,  0.0547, -0.0020,
         0.0655, -0.0196])
Parameter gradient: tensor([ 0.0000e+00,  0.0000e+00, -5.9326e-01,  5.6169e-01,  0.0000e+00,
         4.5618e-03,  4.5334e-03,  1.0739e-04, -2.4907e-05,  0.0000e+00])

Most look like this:

Parameter gradient: None

I am printing out after training like this:

for name, param in model_0.named_parameters():
        if param.requires_grad:
                print('Before step:')
                print(f'Parameter name: {name}:')
                print(f'Parameter data: {param.data}')
                print(f'Parameter gradient: {param.grad}')  # Optional: Print gradients if needed

The data being passed into the model looks like this:

Batch: 2
X shape: torch.Size([30, 128, 16, 16])
Y shape: torch.Size([30])
[ImageFolderCustom] img size 1: torch.Size([128, 159, 115])
[ImageFolderCustom] img size 2: torch.Size([128, 9, 7, 16, 16])
[ImageFolderCustom] img size 3: torch.Size([9, 7, 128, 16, 16])
[ImageFolderCustom] img size 4: torch.Size([63, 128, 16, 16])
[custom_collate] batch type:  <class 'tuple'>
Batch: 3
X shape: torch.Size([63, 128, 16, 16])
Y shape: torch.Size([63])
[ImageFolderCustom] img size 1: torch.Size([128, 210, 101])
[ImageFolderCustom] img size 2: torch.Size([128, 13, 6, 16, 16])
[ImageFolderCustom] img size 3: torch.Size([13, 6, 128, 16, 16])
[ImageFolderCustom] img size 4: torch.Size([78, 128, 16, 16])
[custom_collate] batch type:  <class 'tuple'>

The data coming out looks like this

y.size(), test_pred_labels.size()
(torch.Size([3]), torch.Size([30]))

I am not sure where to look to solve this issue, and I feel like I am not providing the correct information.

Could you post the current model definition including the forward pass?

Hello,
Thanks so much for responding.

import torch
from torch import nn
class TinyVGG(nn.Module):
    """
    Model architecture copying TinyVGG from: 
    https://poloclub.github.io/cnn-explainer/
    """
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int) -> None:
        super().__init__()
        self.conv_block_1 = nn.Sequential(
            nn.Conv2d(in_channels=input_shape, 
                      out_channels=hidden_units, 
                      kernel_size=3, # how big is the square that's going over the image?
                      stride=1, # default
                      padding=1), # options = "valid" (no padding) or "same" (output has same shape as input) or int for specific number 
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units, 
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2,
                         stride=2) # default stride value is same as kernel_size
        )
        self.conv_block_2 = nn.Sequential(
            nn.Conv2d(hidden_units, hidden_units, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(hidden_units, hidden_units, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            # Where did this in_features shape come from? 
            # It's because each layer of our network compresses and changes the shape of our inputs data.
            # nn.Linear(in_features= 31360, # modified from: hidden_units*16*16,
            #           # above is same as 10*130*174
            #           out_features=output_shape)
            nn.LazyLinear(out_features=output_shape)
        )

    def forward(self, x: torch.Tensor):
        x = self.conv_block_1(x)
        print(f"[forward] x shape 1: {x.shape}")
        x = self.conv_block_2(x)
        print(f"[forward] x shape 2: {x.shape}")
        x = self.classifier(x)
        print(f"[forward] x shape 3: {x.shape}")
        return x
        # return self.classifier(self.conv_block_2(self.conv_block_1(x))) # <- leverage the benefits of operator fusion

torch.manual_seed(42)
model_0 = TinyVGG(input_shape=128, # number of color channels (3 for RGB) - modified
                  hidden_units=10, 
                  output_shape=len(train_data_custom.classes)).to(device)
model_0

Your code works for me and shows valid gradients for all registered parameters:

torch.manual_seed(42)
model_0 = TinyVGG(input_shape=128, # number of color channels (3 for RGB) - modified
                  hidden_units=10, 
                  output_shape=10)

x = torch.randn(16, 128, 224, 224)
out = model_0(x)
print(out.shape)
# torch.Size([16, 10])

out.mean().backward()

# check gradients
for name, param in model_0.named_parameters():
    print("param: {}, grad.abs().sum(): {}".format(name, param.grad.abs().sum()))
# param: conv_block_1.0.weight, grad.abs().sum(): 7.223124027252197
# param: conv_block_1.0.bias, grad.abs().sum(): 0.0075920759700238705
# param: conv_block_1.2.weight, grad.abs().sum(): 0.6502994894981384
# param: conv_block_1.2.bias, grad.abs().sum(): 0.020427709445357323
# param: conv_block_2.0.weight, grad.abs().sum(): 0.8013879656791687
# param: conv_block_2.0.bias, grad.abs().sum(): 0.0417448952794075
# param: conv_block_2.2.weight, grad.abs().sum(): 0.5412620902061462
# param: conv_block_2.2.bias, grad.abs().sum(): 0.11074228584766388
# param: classifier.1.weight, grad.abs().sum(): 1004.659423828125
# param: classifier.1.bias, grad.abs().sum(): 1.0000001192092896

“Yes, dividing input images into patches is possible and often used in image processing and machine learning. Patch-based methods can enhance processing efficiency and allow for localized analysis. Many tools, like OpenCV and Python libraries, offer straightforward ways to split images into patches for further analysis or model training.”