Best practice for optimizer param_dict structure to enable hyperparameter tuning?

Lee_Zamparo · June 27, 2018, 8:49pm

I’m looking to integrate a simple hyperparameter search into my training script by varying the regularization parameter values (e.g ‘weight_decay’ on a set of ‘params’ one of my param_dicts passed to optimizer_param_dicts).

The problem is there does not seem to be a good way to identify which ones I want to tune, and which ones I don’t. Here is a sample of how I set up my optimizer:

# Initialize the params, put together the arguments for the optimizer        
optimizer = torch.optim.Adam
optimizer_param_dicts = [
        {'params': weights, 'weight_decay': 5e-3},
        {'params': biases, 'weight_decay': 5e-3},
        {'params': sparse_weights, 'weight_decay': 10e-3, 'tunable': True}            
                    ]
optimizer_kwargs = {'lr': learning_rate_schedule[0]}

I’d like to make the weight_decay parameter applied to those params in the sparse_weights list a tunable hyperparameter. But the only way I’ve found to identify which ones are the sparse weights are to add this extra 'tunable' key to the dict, and then to iterate through them later, mutating the values as required:

for pg in optim.param_groups:
    if pg.get('tunable', False):
        pg['weight_decay'] = <my updated value>

This seems … oooookay for now but far from ideal. I’ve seen other questions about changing learning rates, but this is different since the learning rate is accessible in optim.defaults['lr']. I can’t seem to find another thread that addresses this topic, so I’d appreciate if anyone could weigh in.