The loss function is either very large or very small.

I’m trying to classify two very similar fonts using a binary classification network. I used BCEwithlogitLoss, but the training loss is either very large (e.g., 0.69) or very small (e.g., 0.002). Can you tell me what could be the reason for this?