This is my problem setup.
Train Input size (6300x300) These are standard BERT embeddings, so floating point numbers, mostly negatives.
Train Output size (6300x50) These are binary bit arrays like [0, 0, 1, 1, 0, … 0]
I am using a validation dataset of size 800.
I want to learn a NN network (with two hidden layers) that can map between input to output of train data. I have tried BCEloss. i played with learning rate, weight decay, dropout probability, batch size-these parameters. I also increased hidden parameter size to 1000, changed number of layers. But apparently my validation loss does not decrease or it increases. Training loss stops decreasing after some epochs.
I think my challenge is I have too many labels. I am careful to increase hidden layer size too much as that will increase NN complexity. Eventually i will try to scale my model to 100k sample size.
Can you please help me how I can design NN for this problem.