Model overfits and does not improve on validation accuracy

Hi there,
I am building a feedforward network for a binary text classification task. Here’s my model:

  def __init__(self, output_size, embedding_dim, hidden_size, dropout=0.2):
    super(FeedForward, self).__init__()
    self.embedding = nn.Embedding.from_pretrained(embedding_matrix)
    self.linear_relu_stack = nn.Sequential(
      nn.Linear(in_features=embedding_dim, out_features=hidden_size),
      nn.ReLU(),
      nn.Dropout(p=dropout),
      nn.Linear(in_features=hidden_size, out_features=hidden_size//2),
      nn.ReLU(),
      nn.Dropout(p=dropout),
      nn.Linear(in_features=hidden_size//2, out_features=output_size),
      nn.Sigmoid()
    ) 

  def forward(self, input):
    emb = self.embedding(input)
    emb = torch.sum(emb, dim=1)
    out = self.linear_relu_stack(emb)
    return out

model = FeedForward(OUTPUT_SIZE, EMBEDDINGS_DIM, HIDDEN_SIZE, 0.5).to(device)
print(model)

OUTPUT:

  (embedding): Embedding(204775, 300)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=300, out_features=512, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.4, inplace=False)
    (3): Linear(in_features=512, out_features=256, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.4, inplace=False)
    (6): Linear(in_features=256, out_features=1, bias=True)
    (7): Sigmoid()
  )
)

I am training using 64-batch. I have 1,276,686 rows in train set and 159,586 rows in validation set. I am using FastText word embeddings, and use loss_function = nn.BCELoss() and
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4, weight_decay=1e-7).
Now, the problem is that my model tends to overfit quickly after a few epochs. I have tried training in shuffled mini-batches, decreasing the complexity of the model, add stronger dropout and higher weight decay. But no matter what I do, the model will always overfit once the training accuracy gets to around 81% and the validation loss will go up. The model will not improve and go any further than 81% on validation data. How should I approach this issue?

Here is my training graph of loss: (validation is orange, train is blue)


And accuracy:

As far as I am concerned, your plots like as expected as it gets :). That your test/validation loss will stop improving at some point simply means that your test/validation data is in some sense different from the training data. In other words, your test/validation just “looks” different from your training data, which is in practice normally the case.

In some sense this means that you don’t have enough data so that any split into training/validation/test sets will yield sets the “look” the same. Again, this is more the exception in practice.