# Order of dimensions for NN-layer weights as per .parameters()

I’m getting started with PyTorch and following the 60 minute Blitz tutorial on neural networks.
When defining a new Conv2d layer, one passes (input_channels, output_channels, kernel_size). But when accessing the layer’s parameters by calling .parameters(), the result is [output_channels, input_channels, kernel_dimensions]. What is the rationale behind putting the number of output channels first in the latter case? I found this to be very confusing.

1 Like

This is the shape of the Tensor that represents the weights. You can think of the Conv2d operation as matrix multiplication (simplified):

``````(m x n) . (n x k) => (m x k)
``````

replace `m` by `output_channels`, `n` by `input_channels` and `k` by `n`, and why `output_channels` is the first dim in the params should make sense to you.

Thank you for the quick response. Let’s take a look at a concrete example: I have an input tensor of shape (1,1,32,32), and a convolutional layer with weights of shape (6,1,3,3) as per .parameters(). By taking a look at the corresponding one-layer neural network I see that its output has shape (1,6,30,30). Ignoring image-dimensions and kernel-dimension, according to your heuristic one should get:
(6 x 1) . (1 x 1) => (6 x 1), but by looking at the output we can see that this is not the case, and instead the dimensions are interchanged. This is very unintuitive and confuses me.

1 Like

Let us ignore the batch size for a moment (which is not part of the `weight` shape). Let us take something else than `1` for in the `input_channel`, `2`. You have `(2,32,32)`, you want (6,30,30). Omitting the data dimensions, what happens is: `(6 x 2) . (2 x 1) => (6 x 1)` . Note that the final `1` dim has been appended in the input shape, so you can remove it in the output shape and you end up with `(6)`.

But by that line of reasoning, couldn’t we just as well do the following:
Again, I have (2,32,32) and want (6,30,30). I omit the data dimensions, append 1 in front and have (1 x 2) . (2 x 6) => (1 x 6). Again, I remove 1 and end up with (6). In this way we would’ve kept the order of input-dimension first, output-dimension second for the kernel/layer-weights, identical to when you actually define a layer. Why do it the other way around?

1 Like

Can someone respond to the point in my last post.

1 Like