Order of dimensions for NN-layer weights as per .parameters()

PersonalFinance004 · August 28, 2019, 4:30pm

I’m getting started with PyTorch and following the 60 minute Blitz tutorial on neural networks.
When defining a new Conv2d layer, one passes (input_channels, output_channels, kernel_size). But when accessing the layer’s parameters by calling .parameters(), the result is [output_channels, input_channels, kernel_dimensions]. What is the rationale behind putting the number of output channels first in the latter case? I found this to be very confusing.

spanev · August 28, 2019, 5:37pm

This is the shape of the Tensor that represents the weights. You can think of the Conv2d operation as matrix multiplication (simplified):

(m x n) . (n x k) => (m x k)

replace m by output_channels, n by input_channels and k by n, and why output_channels is the first dim in the params should make sense to you.

PersonalFinance004 · August 29, 2019, 12:02pm

Thank you for the quick response. Let’s take a look at a concrete example: I have an input tensor of shape (1,1,32,32), and a convolutional layer with weights of shape (6,1,3,3) as per .parameters(). By taking a look at the corresponding one-layer neural network I see that its output has shape (1,6,30,30). Ignoring image-dimensions and kernel-dimension, according to your heuristic one should get:
(6 x 1) . (1 x 1) => (6 x 1), but by looking at the output we can see that this is not the case, and instead the dimensions are interchanged. This is very unintuitive and confuses me.

spanev · August 29, 2019, 12:13pm

Let us ignore the batch size for a moment (which is not part of the weight shape). Let us take something else than 1 for in the input_channel, 2. You have (2,32,32), you want (6,30,30). Omitting the data dimensions, what happens is: (6 x 2) . (2 x 1) => (6 x 1) . Note that the final 1 dim has been appended in the input shape, so you can remove it in the output shape and you end up with (6).

PersonalFinance004 · August 29, 2019, 2:57pm

But by that line of reasoning, couldn’t we just as well do the following:
Again, I have (2,32,32) and want (6,30,30). I omit the data dimensions, append 1 in front and have (1 x 2) . (2 x 6) => (1 x 6). Again, I remove 1 and end up with (6). In this way we would’ve kept the order of input-dimension first, output-dimension second for the kernel/layer-weights, identical to when you actually define a layer. Why do it the other way around?

PersonalFinance004 · August 30, 2019, 8:11pm

Can someone respond to the point in my last post.