Dimesions in 3D convnet layer

Subhankar_Ghosh · June 17, 2020, 7:57am

I have a 3D CNN whose initial layers look like:

(conv1): Conv3d(3, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
    (pool1): MaxPool3d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (conv2): Conv3d(64, 192, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
    (pool2): MaxPool3d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (conv3): Conv3d(192, 384, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
    (conv4): Conv3d(384, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
    (conv5): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
    (pool5): MaxPool3d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)

Now I have extracted the 1st layer as:

cnn_weights = model.state_dict()['module.conv1.weight'].cpu()

The shape:
cnn_weights.shape

gives:

torch.Size([64, 3, 3, 3, 3])

Can you all please help me understand what each of those 5 dimensions represent, i.e. which one is height, depth, width, …

Also in the

kernel_size=(3, 3, 3)

what does each of the 3 dimensions represent.

Thanks!

ptrblck · June 17, 2020, 8:32am

The kernel dimensions are defined as:

[out_channels = number of filters, in_channels, depth, height, width]

The kernel size is therefore [depth, height, width] of each filter.