What is a soft label? Is soft label different from general CE in the Teacher model?
I suppose you are referring to Knowledge Distillation/Model compression.
A model’s output(called teacher model in this context) is what generally referred to as
soft label. It is called soft because the output may not be strictly something like
[1, 0, 0] for a 3-class classification task, instead it might something like
[0.85, 0.1, 0.05].
This soft label is used to train a much smaller network(called student model) instead of using the hard targets.
Hope this makes sense. Thanks.
Thank you for your kind reply.