I’m not sure what would work best for your use case, but let’s have a look at the differences between both approaches.
I assume your MRI data has a spatial size of 256x256 and contains 125 slices.
If you’re using nn.Conv2d I would suggest to use the slices as the “channels”.
This would mean that each kernel in your conv layer will have the defined spatial size, e.g. 3, and will use all channels. The kernel shape would be [nb_kernels, 125, 3, 3]. The output will thus be calculated by using a dot product of the small 3x3 window and all slices.
On the other hand, if you are using nn.Conv3d, you could permute the slices to the “depth” dimension and add a single dimension for the channels. This would mean your kernel is now a volume with the shape e.g. [3x3x3], such that 3 neighboring slices will be used to produce the output for the current position.
What would you like your model to learn in the MRI images? I think in a segmentation case the second approach could work better, but that’s just my guess.
I want to use conv3d over a spatial matrix of words. Each cell represents an embedding vector of size 100. Hence, an input is of shape - [batch size, height , width, embedding_dim].
1 - Should I use in_channels = embedding_dim ?
2 - I want to get output of size - [batch_size, height, width, output_size], where output_size is any desired int. I wish to get feature of output_size after convolution for each cell to get the spatial feature for a word (cell).
How would I use conv3d in this case? Please suggest the correct usage and if I am thinking correctly.
In the Conv2D case, the expected input is [batch_size, in_channels, height, width]. When we perform a 2d convolution, each filter is acting upon all the channels. We sort of assume there isn’t a depth aspect in our channels. For example, consider RGB, it’s not like it has to be in that order, we just keep it in that order for convention.
In the Conv3D case, the expected input is [batch_size, in_channels, depth, height, width]. As an example of this would be video. In this case, order does matter since it’s not like we can shuffle all the frames, we end up losing meaning. We use a 3D convolution to embed that information, where each filter strides across our frames (rather than act on them all).
Now for your specific case, it doesn’t sound like you have a “depth” aspect to your data. The entire embedding vector as a whole seems important, it doesn’t make sense to me to stride over your embedding dimension. If I’m understanding correctly, I think you want a Conv2d with kernel_size=1 and stride=1. Your in_channels would be embedding_dim, and your output_channels would be whatever feature size. That’ll result in an output of shape [batch_size, output_channels, height, width]. Keep in mind the order of things.
I have input CT of size 100x512x512, I want to enhance the quality of the CT using corresponding higher quality image. Can I use a Conv3d to stride stride over 3d volumes of neighboring slices to produce a 2d image at the end ?
The 3D convolution would return an output volume, but you could try to reduce one of the dimensions (e.g. the depth).
I.e. an input of [batch_size, channels, depth, height, width] would result in an output of [batch_size, out_channels, depth*, height*, width*], where the * shapes are calculated depending on the kernel size, stride, dilation etc.
I don’t think you should see a significant difference as long as the data augmentation and transformations are applied to the corresponding dimensions, since the activation volume would just be permuted, i.e. the same would happen if you transpose the height and width of an image.
Thank you for the reply.
Let me ask a queston related to conv3d which is related to this post.
I have a code in keras. I have a problem understanding the “same” padding.
My input shape to this layer is: [128, 2048, 4, 2, 2]
This is my keras code:
combined = Conv3D(128, (3, 3, 3),strides=1, padding=‘same’)(combined)
And I want to do it in pytorch. What I did:
self.conv1 = nn.Conv3d(2048, 128, kernel_size=(3,3,3), stride=1, padding=( 1,1,1))