You have data imbalance problem, to alleviate that, You can,
- Use weighted Cross Entropy loss. Or
- Use WeightedRandomSampler() , see this post for more clarification.
After that you can experiment with different optimizers, and learning rates by using learning rate schedulers.