3D capsule neural networks: any way to use nn.Conv3d? Or should I vectorize all convolutions?

arman.avesta · July 12, 2021, 11:04pm

Hi gang,

I’m trying to code convolutions in a 3D capsule neural network (the 3D version of this: https://openreview.net/pdf?id=HJWLfGWRb). I tried using nn.Conv3d, which expects the input to be in the shape: input.shape = (batch, channels, depth, height, width). In a normal convolutional network, each element of this input is a number:

input[batch, channel, depth, height, width] = a number.

In a capsule network, however, each element of this input is a 4x4 pose matrix P:

input[batch, channel, depth, height, width] = P = 4x4 matrix

Each element of the kernel is also a 4x4 matrix (e.g. a 3x3x3 kernel contains 3x3x3x4x4 numbers). At each point in a normal convolutional network, the input value is multiplied by the kernel element and then they’re added together. At each point in a capsule network, the input pose matrix (4x4 matrix) undergoes dot product with the kernel element (also a 4x4 matrix), and then results are added together to give the 4x4 pose matrix of the next capsule layer at that point (with some details omitted for brevity).

I could not find any way to use nn.Conv3d to convolute over 4x4 matrices (instead of simple numbers). Therefore, I’m being forced to vectorize all convolutions in my network using matrix multiplications instead of convolutions (https://arxiv.org/pdf/1501.07338.pdf).

I just wanted to make sure I’m not missing something; that there’s no way to use nn.Conv3d to achieve my goal of convoluting over 4x4 matrices! I would greatly appreciate your help

Arman