Batch processing in Linear layers


I am trying to understand how to process batches in an nn.Linear layer. Since the nn.Linear function is defined using (in_features, out_features) I am not sure how I should handle them when I have batches of data. I am currently processing all batches at once in the forward pass, using

# input_for_linear has the shape [nr_of_observations, batch_size, in_features]
input_for_linear.view(-1, batch_size * in_features)                     

as my input - i.e. flattening all the batches out. My linear layer is defined as:

linear = nn.Linear(batch_size * in_features, out_features)

This process however saves an unnecessary amount of parameters in the linear layer as it differentiates between observations in each batch. With lots of data and small batch sizes it averages out over many epochs so it is maybe not so crucial to change? (right?)

Should I instead do a for-loop over my batches in the input_for_linear and use these in a layer defined with:

linear = nn.Linear(in_features, out_features)

And then lastly combine each of the outputs back to a batch? The results from the linear layers are used in a Variational Auto Encoder with LSTM’s which are capable of handling batched data.

Perhaps there is a smarter solution?

Kind Regards,


You don’t need to change anything, as nn.Linear (and all other layers) are accepting batched data.
Just pass your input as [batch_size, nb_features] to the module and the output will be [batch_size, out_features].


I am not quite sure how to pass my input as [batch_size, nb_features] as I have a time-series of 500 observations, with 4 variables - with 4 of these series in a batch. (These parameters can change as my project progresses.)

How would i pass [batch_size, nb_features] to the linear input layer and how should the linear layer be defined? Also, what is nb_features in this case?

I am currently passing this:

# input_for_linear has the shape [nr_of_observations, batch_size, in_features]
input_for_linear.view(-1, batch_size * in_features)       

Thank you for your help.

It depends how you would like to process this input.
nn.Linear uses a fully-connected weight matrix, such that each input feature will be used to create an output value.
Since you are using a temporal signal, you could create a tensor in the shape [batch_size, sequence_length * features] or alternatively you could also let the linear layer process each “time stamp” separately by passing an input of [batch_size, sequence_length, features].

For temporal singals you should also have a look at e.g. nn.Conv1d etc. as they might be beneficial for your use case.

in_features defined the number of input features for the linear layer.


Hi ! I am a little confused with this part too…
Let’s say if I used the input of Linear Layer as input = [batch_size, sequence_length, feature_dim]
So, does this mean that the size of the last dimension (in this case it is feature_dim) has to match the input dimension of the Linear Layer is that right?

1 Like

Yes, that’s correct as seen here:

x = torch.randn(2, 3, 4)
lin = nn.Linear(4, 10)
out = lin(x)
> torch.Size([2, 3, 10])