Got almost zero loss from the second iteration in multilabel classification

Hi everyone,

I’m trying to train a neural network to solve a problem that I framed as a multi-label classification problem. Everything is almost working fine after reading a lot about the problem. However, I got bad results when evaluating the model, and I realized that the training loss is a very low (almost zero) starting from the second iteration in training.

  • Loss function: BCEWithLogitsLoss with a linear output layer
  • Optimizer: Adam
  • Note1: The dataset in very sparse as I have over 14k labels and about 10 true labels in every sample. Does this affect the result?
  • Note2: As I was getting a memory error when loading the whole dataset, I had to train on part of data, save the model and optimizer and re-load them to train on the second part. (I’ve tried all the other solutions in these topics but none of them worked)
    Any idea would be much appreciated,