Two-phase training for imbalanced data

JessM · August 18, 2022, 2:10pm

Hi everyone,
I’m training a basic feedforward NN on an imbalanced dataset and I wanted to implement a two-phase training approach:
In the first training phase the model is trained on an undersampled version of the dataset so that all labels have equal probability.
The second phase then accounts for the data imbalance by re-training only the output layer of the NN with the original distribution of the dataset.

What would be the workflow for something like this?
Train entire model on under sampled dataset → set param.requires_grad = False for all layers except last (output layer) → save model parameters → re-load model and train on imbalanced dataset?

Is it possible to do all this in one run or would I do two separate runs?

Any tips/insight would be appreciated!

ptrblck · August 19, 2022, 4:23am

You wouldn’t need to save and reload the model and could do both training passes in a single script.
However, if you think it’s more convenient to write different script, it’s also fine and I don’t see a reason this would not work, too.

JessM · August 19, 2022, 12:08pm

If I use two scripts the first script will end by saving the model weights

torch.save(model.state_dict(), out_dir / "model_parameters.pth")

And then the second script starts by reloading those weights to a new model?

model2 = model.load_state_dict(torch.load("model_parameters.pth"))

And then I presume the freezing step will follow:

    for name, param in model2.named_parameters():
        if not '12' in name:
            param.requires_grad = False

I’m using ‘12’ since the names in named_parameters are as follows:
[‘linear_relu_stack.0.weight’, ‘linear_relu_stack.0.bias’, ‘linear_relu_stack.3.weight’, ‘linear_relu_stack.3.bias’, ‘linear_relu_stack.6.weight’, ‘linear_relu_stack.6.bias’, ‘linear_relu_stack.9.weight’, ‘linear_relu_stack.9.bias’, ‘linear_relu_stack.12.weight’, ‘linear_relu_stack.12.bias’]

Is my thinking here correct?

ptrblck · August 19, 2022, 3:26pm

Yes, your approach sounds correct.