Hello everyone, I’m using knowledge distillation to train a model. The teacher model has previously been trained and is designed to guide the student. However, when the student trains, the total loss is negative. Is this normal behaviour during training? Any suggestions are welcome. Thank you very m…

I think I found this one helpful, and solved the issue I faced. [image] KL Divergence produces negative values Did you normalized values with log_softmax? torch.nn.KLDivLoss(size_average=False)(F.log_softmax(scores, -1), targets)

Knowledge Distillation

vision

ageryw (A. Gery) October 19, 2022, 12:03pm 2

I think I found this one helpful, and solved the issue I faced.