nn.Linear paramater meaning

Just started PyTorch, and I cannot understand the meaning of nn.Linear(input_features, output_features, bias = True)
From my point of view, linear regression is supposed to be determined by its weight and bias, what does input_features and output_features mean?

Ok lemme explain this:
U see the input feature simply means how many features of a single data point is going to be fed into the network, the output feature means the size of the output and the bias is simply a matrix of values that will be added to the product of ur input and weight.

Say u have a training data with shape (N, F) where N is the number of data points an F is the number of features per data point.
Let’s select a random data point from the training data, it’s shape will be (1, F).
Let’s assume that F = 5, therefore there are 5 features per data point

In this case if ur network is a single dense / linear layers network, the input feature is 5.
The output feature in this case will be the number of points per target label (continuous or categorical) simply because it’s a single layer network.
So now let’s assume that the points per target in ur training data is 1, therefore ur output feature will be 1.
Your linear layer will look sth like this:
nn.Linear(5, 1).
Now that these numbers have been specified, the shape of your weights will be (5, 1) and the size of ur bias will be (1, 1) because the the output neuron / feature is 1

So let’s say ur weights is denoted by ‘w’ and bias by ‘b’ and input by ‘x’ and output by ‘y_pred’ and target per data point by ‘y_actual’:

y_pred = xw + b ≠ wx + b (coz of matrix multiplication rules)

Now the shape of y_pred is always equal to the shape of y_actual so that the loss can be computed fine.
Also from what I initially said, there is 1 point per target so the target data is a shape of (N, 1) therefore, the target for a single data point (y_actual) should be a of shape (1, 1)

Now if u remember from matrix multiplication, let’s check if there shapes are compatible:

y_pred = (1, 5) x (5, 1) + (1, 1)

The shape of y_pred is (1, 1) for a single data point which is correct coz it’s also the same as that of y_actual.

Now these weights and biases are initalized randomly and behind the scenes so u can’t really see them except u print them out.
U can also initialize the weights however u please, but just be careful not to make them too large and with same values in each cell (can cause exploding gradients).
Sometime u would want to set the bias to false simply coz u don’t need them Eg: When u are using batch Normalization after a layer.

I Know I talked too much but hopefully after this u’ll understand this in full detail.


So in summary of it all the input and output feature parameters are used to instanciate the shape of the weight and bias.