How to feed patches of a large resolution image for classification

I am trying to do a image classification problem. I want to keep the original image size and just feed the patches of an images and then classify their category.
I am quite unclear on few points-
1)How to extract the patches, I have already resized the image from different aspect ratio to 2k,2k image?
2)what will be the optimised way to train it, to extract it at the run time or just feed the patches made beforehand?
3)How the inference will pan out in such case, I want to build a interface which accepts the complete image and not just a patch?

Could you explain the general use case a bit more?
What would the final prediction be? Would you apply some kind of voting using the predictions for all patches of the current image or would each patch get a prediction and you wouldn’t care about the original image?

Based on the use case, there might be different approaches.
E.g. you could use torchvision.transforms.FiveCrop to create 5 patches at predefined positions as a preprocessing step or you could use unfold to create the patches manually.

1 Like

Thanks for replying,

Yes, I am trying to find the certain textures of an image and depending on the texture then classifying it in 3 cases.
I want to try both approaches i.e voting based approach and also each patches getting a prediction(as I have very small dataset) and then test which one is better?

I don’t think that one of these approaches is better, as you could achieve the same output.
unfold would give you more flexibility, but if you want to use the 4 corner patches and a center one, FiveCrop would be easier than writing it manually.

1 Like

Is there a faster way to achieve that.
I am using FiveCropfor size 224 but it is taking 35 minutes for 1 epoch.
Also, how the testing will pan out in this case. Right now, I am doing the RandomCrop of the same 224 size?

Is FiveCrop slowing down the code and if so how long is an epoch taking without this transformation, i.e. for a single input only?

I’m not familiar with your use case and thus asked, if you could describe the general use case a bit.

E.g. if the image should get a single prediction, you could use the majority prediction of all 5 patches.
If all predictions are different, you could try to use the prediction with the highest logit.

1 Like

Yes, if I am using RandomCrop for size 224 then its takes 5 minutes for 1 epochs while 35 minutes is the output for other.

Okay, Let me explain, So I am solving a multi-class image classification problem. I have 3 classes and I am using Pytorch’s ImageFolder data loader so the labels are 0,1,2.
My image size are (2k,2k), I can’t downsample the dataset because each distinct class image differs because of the texture of the image and downsampling making all the images look same.
So first, I resorted to RandomCrop, validation loss is skyrocketing high using the RandomCrop.
Then I moved to FiveCrop which is moving the validation loss in the range of training loss.
I am using the implementation of FiveCrop from the Pytorch documentation -
Below -

transform = Compose([
>>>    FiveCrop(size), # this is a list of PIL Images
>>>    Lambda(lambda crops: torch.stack([ToTensor()(crop) for crop in crops])) # returns a 4D tensor
>>> ])
>>> #In your test loop you can do the following:
>>> input, target = batch # input is a 5d tensor, target is 2d
>>> bs, ncrops, c, h, w = input.size()
>>> result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops
>>> result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops

Personally, I want the inference to watch every part of image and then give the prediction but I don’t know how it will pan out in this case as image size is very large.
Please, let me know if I explained my use-case alright or not.(I’ll describe again)

Could you profile the data loading as done in the ImageNet example to narrow down which part of the training gets the massive slowdown?

1 Like

I was using num_workers = 1. That was the bottleneck, it is fixed now.

How does prediction works in FiveCrop. Is it majority of the class obtained? I have 3 classes, isn’t there a chance of tie if the majority of the class is the prediction or each prediction of each crop is considered an output?

FiveCrop doesn’t automatically change the prediction so you would have to implement a voting logic.

You could use e.g. a majority voting (and in a tie situation pick the prediction(s) with the highest logic), average all output logits and get a single prediction, or probably even train another classifier on top of your predictions and treat your current model as an ensemble model.