I am in the process of making my first CNN challenge and so far what has amazed me is that Pytorch offers almost an easy fix to anything needed. Except for patch making.
I have 3 dimensional data samples ranging in the 500x500x500. Such huge piece of data ofcourse can’t be fed to the network in one piece and therefore Patching is required. In my case the dimensions are not consistent and I do not allow for padding hence some fairly complex coding is required from me to create and reassemble these patches.
Say you want to create 64x64x64 patches out of the 500x500x500 data sample what do you guys do ?
Pytorch solves batch making for you with the dataloader but what about patch making? does Pytorch really not have anything to offer in that regard?
Well the suggestion you made is something I am already doing. In my case that is a solution with hundreds of lines of code. It i will involve atleast 3 nested for loops.
and when you do not allow for padding and you wish overlapping of the data instead it becomes even more complex.
I don’t now if it is not general. I can tell by now that it must be needed in almost any project that involves medical imaging.
As you can see patches will give you 7*7*7=343 patches each of shape [64, 64, 64].
If you would like to overlap the patches, you should change the stride for each dimension.
Let me know, if that works for you.
500 % 64 is not zero so ofcourse you cannot make the orignal 500x500x500 again but can you reassemble the 343 patches back into something that makes sense?
I have been playing around with your fine example.
Im not sure it works correctly when it comes to changing the stride.
consider the dimensions 407x301x360 and you define a stride of 20 and kernel of 64
According to my calculation that should yield 20x15x18 = 5400 patches. Alas your code only yields 3240 patches.
I might be calculating it wrong.
In my project I create 64x dim patches and when processed through the network they yield 24x dim result patches. These patches needs needs to be created with enough overlapping to capture the entire screening and they need to be put back together in the same order the patches were extracted from the original screening.
This is a little illustration i made.
The black box is the original screening
the small red boxes are result 24x dim patches
the gray box is a 64x patch.
the different color boxes is an illustration on how the stride has to work
I am using conv3d. a ct screening is not a 2 dimensional image it is a 3 dimensional voxel space. (nifti/dicom)
But as far as i can see x in your example is 3 dimensional(technically 4). So the problem is perhaps that you are only striding along 2 axis and not 3 ?
Only guessing and perhaps I am the one miss calculating.
Yeah, I assumed you are using a 3-dim medical image. That’s why I also used the “channel” dimension to create the patches. Otherwise the complete channels would be used in each 2-dim patch.
The size calculation would be the same for nn.Conv3d, but apparently I posted the definition for nn.Conv2d
Your calculations e.g. ignore the kernel size and assume some padding. Note that the “last” kernel might not fit into your input for a certain stride.
Hi ptrblck, I’ve been trying unfold, how would I get the batch size right? should I just feed the reshaped tensor with (num_patches_in_y *num_patches_in_x * batch_size, height, width, channel) shape?
Assuming you are using this code snippet, then you wouldn’t have to change anything besides assigning a different batch size to the input and view operation: