I’m training a basic feedforward NN on an imbalanced dataset and I wanted to implement a two-phase training approach:
In the first training phase the model is trained on an undersampled version of the dataset so that all labels have equal probability.
The second phase then accounts for the data imbalance by re-training only the output layer of the NN with the original distribution of the dataset.
What would be the workflow for something like this?
Train entire model on under sampled dataset → set param.requires_grad = False for all layers except last (output layer) → save model parameters → re-load model and train on imbalanced dataset?
Is it possible to do all this in one run or would I do two separate runs?
Any tips/insight would be appreciated!
You wouldn’t need to save and reload the model and could do both training passes in a single script.
However, if you think it’s more convenient to write different script, it’s also fine and I don’t see a reason this would not work, too.
If I use two scripts the first script will end by saving the model weights
torch.save(model.state_dict(), out_dir / "model_parameters.pth")
And then the second script starts by reloading those weights to a new model?
model2 = model.load_state_dict(torch.load("model_parameters.pth"))
And then I presume the freezing step will follow:
for name, param in model2.named_parameters():
if not '12' in name:
param.requires_grad = False
I’m using ‘12’ since the names in named_parameters are as follows:
[‘linear_relu_stack.0.weight’, ‘linear_relu_stack.0.bias’, ‘linear_relu_stack.3.weight’, ‘linear_relu_stack.3.bias’, ‘linear_relu_stack.6.weight’, ‘linear_relu_stack.6.bias’, ‘linear_relu_stack.9.weight’, ‘linear_relu_stack.9.bias’, ‘linear_relu_stack.12.weight’, ‘linear_relu_stack.12.bias’]
Is my thinking here correct?
Yes, your approach sounds correct.