I’m getting started with PyTorch and following the 60 minute Blitz tutorial on neural networks.

When defining a new Conv2d layer, one passes (input_channels, output_channels, kernel_size). But when accessing the layer’s parameters by calling .parameters(), the result is [output_channels, input_channels, kernel_dimensions]. What is the rationale behind putting the number of output channels first in the latter case? I found this to be very confusing.

This is the shape of the Tensor that represents the weights. You can think of the Conv2d operation as matrix multiplication (simplified):

```
(m x n) . (n x k) => (m x k)
```

replace `m`

by `output_channels`

, `n`

by `input_channels`

and `k`

by `n`

, and why `output_channels`

is the first dim in the params should make sense to you.

Thank you for the quick response. Let’s take a look at a concrete example: I have an input tensor of shape (1,1,32,32), and a convolutional layer with weights of shape (6,1,3,3) as per .parameters(). By taking a look at the corresponding one-layer neural network I see that its output has shape (1,6,30,30). Ignoring image-dimensions and kernel-dimension, according to your heuristic one should get:

(6 x 1) . (1 x 1) => (6 x 1), but by looking at the output we can see that this is not the case, and instead the dimensions are interchanged. This is very unintuitive and confuses me.

Let us ignore the batch size for a moment (which is not part of the `weight`

shape). Let us take something else than `1`

for in the `input_channel`

, `2`

. You have `(2,32,32)`

, you want (6,30,30). Omitting the data dimensions, what happens is: `(6 x 2) . (2 x 1) => (6 x 1)`

. Note that the final `1`

dim has been *appended* in the input shape, so you can remove it in the output shape and you end up with `(6)`

.

But by that line of reasoning, couldn’t we just as well do the following:

Again, I have (2,32,32) and want (6,30,30). I omit the data dimensions, append 1 in front and have (1 x 2) . (2 x 6) => (1 x 6). Again, I remove 1 and end up with (6). In this way we would’ve kept the order of input-dimension first, output-dimension second for the kernel/layer-weights, identical to when you actually define a layer. Why do it the other way around?

Can someone respond to the point in my last post.