I have a tensor P, with dimension: (batch-size x num-layers x length x embedding-size)
I want to concatenate the embeddings across all layers, so eventually, I want a tensor with the following dimensions:
(batch-size x length x num-layers*embedding-size)
Let’s take an example:
P = torch.randn(10,3,105,1024)
where batch-size = 10, num-layers = 3, length-of-sentence=105, embedding-size=1024.
I want to concatenate the embeddings of 3 layers for each time-stamp in the sentence.
One way I can do this is:
batch_size = 10
concats = []
for idx in range(batch_size):
concats.append(torch.cat([P[idx][0], P[idx][1], P[idx][2]], dim=1)[None, :, :])
Q = torch.cat(concats, dim=0)
Q’s dimension : (10,105,3072)
Note that, R = P.view(10, 105, -1)
also gives me a tensor with similar dimensions as that of Q, but it will be a different tensor than Q, as it is concatenating the first layer for time-stamp 1,2,3 etc.
Is there any faster memory-efficient way of getting the Q tensor?