Understanding 3D Conv Feature Maps

Gautier30 · July 15, 2024, 12:40pm

Hi community

I am using 3D Conv layers in a network where the inputs are stacked images of a subject at 3 points in time. Across the network I keep the depth at 3 although the spatial dimensions of the images are reduced by pooling layers. Visualizing the feature maps for each activation, I was expecting to see patterns relating to each of the 3 slices individually, but it looks like all feature maps are superimpositions of the 3 slices.

For instance, on the feature map indexed depth 0 of the first convolution layer, I can see details that look like the input image at depth 3. Is that normal behavior and is my intuition about 3D conv layers completely wrong?

Thanks!

ptrblck · July 15, 2024, 3:33pm

nn.Conv3d layers will use kernels with 3 volumetric dimensions and will thus also perform the convolution in these 3 dimensions. I.e. if your kernel size of [out_channels, in_channels, depth, height, width], the depth dimension of the kernel will be applied to the depth of the activation input. For values of depth>1 neighboring voxels are used (in the same way as neighboring pixels are using for height,width>1 in an nn.Conv2d).
Let me know if I misunderstood your use case.