As far as I understand the documentation for BatchNorm1d layer we provide number of features as argument to constructor(`nn.BatchNorm1d(number of features)`

).

As an input the layer takes *(N, C, L)*, where *N* is batch size (I guess…), *C* is the number of features (this is the dimension where normalization is computed), and *L* is the input size.

Let’s assume I have **input** in following shape:

`(batch_size, number_of_timesteps, number_of_features)`

which is usual data shape for time series if *batch_first=True*.

**Question**

Should I transpose the input (swap dimension 1 and 2) before running the batch normalization?

In this case I will have to transpose the output again to use it in *RNN* later. It looks quite weird to me.

Can someone please take a look at below example and let me know if this is the proper way.

E.g.:

```
import torch
from torch import nn
# data (batch size, number of time steps, number of features)
x = torch.rand(3, 4, 5)
# layers
bn = nn.BatchNorm1d(5)
rnn = nn.RNN(5, 10, 1, batch_first=True)
# computation - transpose TWICE
x_normalized = bn(x.transpose(1, 2)).transpose(1, 2)
rnn(x_normalized)
```