What is a soft label? Is soft label different from general CE in the Teacher model?
I suppose you are referring to Knowledge Distillation/Model compression.
A model’s output(called teacher model in this context) is what generally referred to as soft label
. It is called soft because the output may not be strictly something like [1, 0, 0]
for a 3-class classification task, instead it might something like [0.85, 0.1, 0.05]
.
This soft label is used to train a much smaller network(called student model) instead of using the hard targets.
Hope this makes sense. Thanks.
1 Like
Thank you for your kind reply.
1 Like