I am building a feedforward network for a binary text classification task. Here’s my model:
def __init__(self, output_size, embedding_dim, hidden_size, dropout=0.2): super(FeedForward, self).__init__() self.embedding = nn.Embedding.from_pretrained(embedding_matrix) self.linear_relu_stack = nn.Sequential( nn.Linear(in_features=embedding_dim, out_features=hidden_size), nn.ReLU(), nn.Dropout(p=dropout), nn.Linear(in_features=hidden_size, out_features=hidden_size//2), nn.ReLU(), nn.Dropout(p=dropout), nn.Linear(in_features=hidden_size//2, out_features=output_size), nn.Sigmoid() ) def forward(self, input): emb = self.embedding(input) emb = torch.sum(emb, dim=1) out = self.linear_relu_stack(emb) return out model = FeedForward(OUTPUT_SIZE, EMBEDDINGS_DIM, HIDDEN_SIZE, 0.5).to(device) print(model)
(embedding): Embedding(204775, 300) (linear_relu_stack): Sequential( (0): Linear(in_features=300, out_features=512, bias=True) (1): ReLU() (2): Dropout(p=0.4, inplace=False) (3): Linear(in_features=512, out_features=256, bias=True) (4): ReLU() (5): Dropout(p=0.4, inplace=False) (6): Linear(in_features=256, out_features=1, bias=True) (7): Sigmoid() ) )
I am training using 64-batch. I have 1,276,686 rows in train set and 159,586 rows in validation set. I am using FastText word embeddings, and use
loss_function = nn.BCELoss() and
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4, weight_decay=1e-7).
Now, the problem is that my model tends to overfit quickly after a few epochs. I have tried training in shuffled mini-batches, decreasing the complexity of the model, add stronger dropout and higher weight decay. But no matter what I do, the model will always overfit once the training accuracy gets to around 81% and the validation loss will go up. The model will not improve and go any further than 81% on validation data. How should I approach this issue?
Here is my training graph of loss: (validation is orange, train is blue)