In Andrej Karpathy’s famous “A Recipe for Training Neural Networks” post, he recommends:
Initialize the final layer weights correctly. […] If you have an imbalanced dataset of a ratio 1:10 of positives:negatives, set the bias on your logits such that your network predicts probability of 0.1 at initialization. Setting these correctly will speed up convergence and eliminate “hockey stick” loss curves where in the first few iteration your network is basically just learning the bias.
How do you do this?
More concretely, say we have the following:
- a multi-class, multi-label dataset
len(dataset)
is 1000- dataset sample counts :
{A: 10, B: 100, C: 900}
model.fc = nn.Linear(num_features, 3, bias=True)
Therefore, I want the model to output probabilities [0.01, 0.1, 0.9]
at initialization.
How do I calculate the tensors I need to initialize model.fc.bias.data
and model.fc.weight.data
properly?