I have a tensor P, with dimension: (batch-size x num-layers x length x embedding-size)

I want to concatenate the embeddings across all layers, so eventually, I want a tensor with the following dimensions:

(batch-size x length x num-layers*embedding-size)

Let’s take an example:

`P = torch.randn(10,3,105,1024)`

where batch-size = 10, num-layers = 3, length-of-sentence=105, embedding-size=1024.

I want to concatenate the embeddings of 3 layers for each time-stamp in the sentence.

One way I can do this is:

```
batch_size = 10
concats = []
for idx in range(batch_size):
concats.append(torch.cat([P[idx][0], P[idx][1], P[idx][2]], dim=1)[None, :, :])
Q = torch.cat(concats, dim=0)
```

Q’s dimension : (10,105,3072)

Note that, `R = P.view(10, 105, -1)`

also gives me a tensor with similar dimensions as that of Q, but it will be a different tensor than Q, as it is concatenating the first layer for time-stamp 1,2,3 etc.

Is there any faster memory-efficient way of getting the Q tensor?