Understanding 3D Conv Feature Maps

Hi community :slight_smile:

I am using 3D Conv layers in a network where the inputs are stacked images of a subject at 3 points in time. Across the network I keep the depth at 3 although the spatial dimensions of the images are reduced by pooling layers. Visualizing the feature maps for each activation, I was expecting to see patterns relating to each of the 3 slices individually, but it looks like all feature maps are superimpositions of the 3 slices.

For instance, on the feature map indexed depth 0 of the first convolution layer, I can see details that look like the input image at depth 3. Is that normal behavior and is my intuition about 3D conv layers completely wrong?

Thanks!

nn.Conv3d layers will use kernels with 3 volumetric dimensions and will thus also perform the convolution in these 3 dimensions. I.e. if your kernel size of [out_channels, in_channels, depth, height, width], the depth dimension of the kernel will be applied to the depth of the activation input. For values of depth>1 neighboring voxels are used (in the same way as neighboring pixels are using for height,width>1 in an nn.Conv2d).
Let me know if I misunderstood your use case.