Seems good. Since the activations (nn.ReLU and nn.Softmax in your example) don’t store learnable parameters, they can be easily replaced with their functional counterparts from torch.nn.functional (F.relu and F.softmax), but this is not necessary (just an additional improvement).

So here is my doubt, when I get the shape of the output, it is not [output_height] but rather [input_height, output_height]

Say if my image input_width=100 and input_height=50 and output_height=10, then my output’s shape is [50, 10]? Am I not supposed to get just vector of [10]?

Not sure if I fully understand you (you mention input_width two times but with different values). If you are working with images as you mention, then you can flatten them, but don’t expect good results (there is no notion of width or height in the linear layer). In most cases, CNNs are used for image-related tasks usually CNNs.

I guess you are not sure about the nn.Layers's input and output dimensions. I’ll give you an example.

linear_layer = nn.Linear(10, 2)
x = torch.randn(5, 10)
linear_layer(x).shape # output shape: [5, 2]
x = torch.randn(10)
linear_layer(x).shape # output shape: [2]
x = torch.randn(3, 12, 4, 10) # it even works with 4D data
linear_layer(x).shape # output shape: [3, 12, 4, 2]

In my example, 10 is the input dimension and 2 is the output dimension. It means that the nn.Linear will perform the following transformation: x @ W.T + b (@ stands for matrix multiplication, T means maxtrix transpose) where W has the shape of [2, 10] and b has the shape of [2], but it will be broadcasted. With this formula in mind and some basic algebra you can easily conclude what the output shape (given the input) is going to be.

In PyTorch, it’s really easy to debug the model. Just add print statements (this way you can easily check shapes in the forward method after each transformation).