Linear layer with unknown dimension

Hello everyone,

I have a tensor of size (batch_size, seq_length, embed_dim) and after passing through a linear layer I would like to get an input of (batch_size, latent_dim). The part of the code is shown below

class TransformerEncoder(nn.Module):
    def __init__(self, head, embedding_size, pos_embed, latent_dim = 512):
        super(TransformerEncoder, self).__init__()
        self.head = head
        self.embedding_size = embedding_size
        self.pos_embed = pos_embed 
        self.self_attention = SelfAttention(head, embedding_size)
        self.fc1 = nn.Linear(embedding_size, latent_dim)
        self.fc2 = nn.Linear(embedding_size, latent_dim)
        self.dropout = nn.Dropout(0.1)
    def forward(self, x):
        batch_size, max_words = x.size()[:2]
        x = self.self_attention(x) 
        x = self.dropout(x)
        pos_x = self.pos_embed.get_postional_embeddings()
        x = (x + pos_x)
        x = x.view(batch_size, -1)
        mean, logvar = self.fc1(x), self.fc2(x)
        return mean, logvar

the variable x is of 3 dimensions, I don’t know the value of batch_size and seq_length until the forward propagation starts. So there will be a size mismatch in the operation self.fc1(x) and in self.fc2(x). One solution I can think of is to assign a large value to seq_length and pad all the inputs to that length. But this would slow down the learning process. Is there anything else that can be done?


nn.Linear accepts a variable number of dimensions as explained in the docs:

Input: (N, *, H_in) where ∗ means any number of additional dimensions and H_in = in_features

So you should be able to pass the input in the shape [batch_size, *any_number_of_dimensions, in_features] to the linear layers.
By using this approach the linear layer will be applied to each input in the * dimension(s).


But the batch_size value will be known at the time of initializing self.fc1 and self.fc2. Only information i have will be the embedding length and the latent dimension length.

You don’t need to know the batch size during initialization of the layer, since if will take any batch size, which doesn’t let your system (or GPU) run out of memory.

I didn’t mean to give importance to batch_size in my previous reply. The issue is that the input to self.fc1 is supposed to be (batch_size, seq_length * embed_dim) and output would be (batch_size, latent_dim). From what I understood from your answer, my input will be (batch_size, seq_length, embed_dim) shaped tensor and output will be a (batch_size, seq_length, latent_dim) shaped tensor, which is different from what I wanted.

Thanks for clarifying the shapes.
In that case you could try to apply padding to the input and use the max. expected size.