Variable input length in each batch for custom word embedding

s_karki · March 8, 2023, 12:27pm

With the help of collate_fn i’ve created variable input length for each batch.(same input length in single batch but different in another)
I want to create the sentiment classifier with the custom embedding.

import torch
import torch.nn as nn

class WordEmbedding(nn.Module):

    def __init__(self, vocab_size, embed_size):
        super(WordEmbedding, self).__init__()

        self.vocab_size = vocab_size
        self.embed_size = embed_size
        self.embedding = nn.Embedding(self.vocab_size, self.embed_size, sparse=True)
        self.fc1 = nn.Linear(self.embed_size*4767, 256)
        self.dropout = nn.Dropout(p=0.3)
        self.fc2 = nn.Linear(256, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.embedding(x).view((x.size(0), -1))     
        out = self.fc1(x)
        out = self.fc2(self.dropout(out))
        out = self.sigmoid(out)
        return out

Since, for each batch, there is different length(not equal to above vocab_size), there is matrix mis match error.
I was wondering if there is any method i can train my model for different batch input length considering above model architecture.?

Theo · March 10, 2023, 1:53pm

Did you use padding when tokenizing your dataset?