ULMFit fine-tuning freeze

pattiJane · May 5, 2020, 9:53am

Hello,

I have trained a language model and now I want to fine-tune this pre-trained model.

model.summary()
model.load_encoder('lmtest') 
model.freeze()
model.summary()

Before loading the encoder I look at model summary and once again after loading and freezing. However in both summaries trainable modules and number of parameters are still same. I would expect freeze() to set everything other than last layer to non-trainable if I understood correctly. So why does not freeze() change anything (visible)?

I am quite new to pytorch and I would appreciate your guidance.

ecdrid · May 5, 2020, 10:11am

After freeze, try printing,

for name, params in model.named_parameters():
    if params.requires_grad:
        print(name)

pattiJane · May 5, 2020, 10:17am

Thanks a lot! I tried it and got the following error:

'RNNLearner' object has no attribute 'named_parameters'

This post says it might be an indentation error but I checked and there was none.

ecdrid · May 5, 2020, 10:21am

What’s RNN Learner? Also if it’s a model which inherits from nn.module then the above 3 line will definitely run.

pattiJane · May 5, 2020, 10:26am

It is from ULMFit architecture and takes Sequential.RNN which inherits nn.Sequential as an input as I understand.

github.com

fastai/fastai/blob/564896d7b84b59bee40db19ee298a1028235442b/fastai/text/learner.py#L45


    for i,w in enumerate(itos_new):
        r = stoi_wgts[w] if w in stoi_wgts else -1
        new_w[i] = enc_wgts[r] if r>=0 else wgts_m
        if dec_bias is not None: new_b[i] = dec_bias[r] if r>=0 else bias_m
    wgts['0.encoder.weight'] = new_w
    if '0.encoder_dp.emb.weight' in wgts: wgts['0.encoder_dp.emb.weight'] = new_w.clone()
    wgts['1.decoder.weight'] = new_w.clone()
    if dec_bias is not None: wgts['1.decoder.bias'] = new_b
    return wgts


class RNNLearner(Learner):
    "Basic class for a `Learner` in NLP."
    def __init__(self, data:DataBunch, model:nn.Module, split_func:OptSplitFunc=None, clip:float=None,
                 alpha:float=2., beta:float=1., metrics=None, **learn_kwargs):
        is_class = (hasattr(data.train_ds, 'y') and (isinstance(data.train_ds.y, CategoryList) or
                                                     isinstance(data.train_ds.y, LMLabelList)))
        metrics = ifnone(metrics, ([accuracy] if is_class else []))
        super().__init__(data, model, metrics=metrics, **learn_kwargs)
        self.callbacks.append(RNNTrainer(self, alpha=alpha, beta=beta))
        if clip: self.callback_fns.append(partial(GradientClipping, clip=clip))
        if split_func: self.split(split_func)

github.com

fastai/fastai/blob/564896d7b84b59bee40db19ee298a1028235442b/fastai/text/models/awd_lstm.py#L162


        self.output_dp = RNNDropout(output_p)
        if bias: self.decoder.bias.data.zero_()
        if tie_encoder: self.decoder.weight = tie_encoder.weight


    def forward(self, input:Tuple[Tensor,Tensor])->Tuple[Tensor,Tensor,Tensor]:
        raw_outputs, outputs = input
        output = self.output_dp(outputs[-1])
        decoded = self.decoder(output)
        return decoded, raw_outputs, outputs


class SequentialRNN(nn.Sequential):
    "A sequential module that passes the reset call to its children."
    def reset(self):
        for c in self.children():
            if hasattr(c, 'reset'): c.reset()


def awd_lstm_lm_split(model:nn.Module) -> List[List[nn.Module]]:
    "Split a RNN `model` in groups for differential learning rates."
    groups = [[rnn, dp] for rnn, dp in zip(model[0].rnns, model[0].hidden_dps)]
    return groups + [[model[0].encoder, model[0].encoder_dp, model[1]]]

ecdrid · May 5, 2020, 10:30am

Try this,

for name, params in your_learner_object.model.named_parameters():
    if params.requires_grad:
        print(name)

pattiJane · May 5, 2020, 10:38am

Thanks a lot! It works now!
It only prints parameters from the last layer so it means freeze() works I think

1.layers.0.weight
1.layers.0.bias
1.layers.2.weight
1.layers.2.bias
1.layers.4.weight
1.layers.4.bias
1.layers.6.weight
1.layers.6.bias