Fine-tuning with LoRA using peft

dumpling · September 3, 2024, 8:09am

Hi, I am a beginner. I want to fine-tune a model using LoRA with the peft package, it’s this model: GitHub - lbcb-sci/RiNALMo: RiboNucleic Acid (RNA) Language Model. However, since only torch.nn.Linear and Conv1D are supported, I cannot fine-tune it and select the layers I want as the linear layers for, say, query key and value matrices are wrapped in another module. If I show the named modules, layers, they are wrapped basically. Can anyone help to circumvent this? Is there a way to access the linear layers wrapped in a bigger module?

Mayank_Kumar · September 3, 2024, 7:57pm

what do you mean by they are wrapped in another module ?

Like in Huggingface transformers, query matrix is at:

model.vit.encoder.layer[0].attention.attention.query

Is this what you need ?

dumpling · September 5, 2024, 11:43am

Thanks for the response! So if you do the following for this rinalmo model:

for name, param in model.named_parameters():
    if param.requires_grad:
        print(name)

You will get something like this:

embedding.weight
transformer.blocks.0.mh_attn.Wqkv.weight
transformer.blocks.0.mh_attn.out_proj.weight
transformer.blocks.0.attn_layer_norm.weight
transformer.blocks.0.attn_layer_norm.bias
transformer.blocks.0.transition.0.beta
transformer.blocks.0.transition.0.linear.weight
transformer.blocks.0.transition.0.linear.bias
transformer.blocks.0.transition.0.linear_gate.weight
transformer.blocks.0.transition.0.linear_gate.bias
transformer.blocks.0.transition.2.weight
transformer.blocks.0.transition.2.bias
transformer.blocks.0.out_layer_norm.weight
transformer.blocks.0.out_layer_norm.bias
transformer.blocks.1.mh_attn.Wqkv.weight
transformer.blocks.1.mh_attn.out_proj.weight
transformer.blocks.1.attn_layer_norm.weight
transformer.blocks.1.attn_layer_norm.bias
transformer.blocks.1.transition.0.beta
transformer.blocks.1.transition.0.linear.weight
transformer.blocks.1.transition.0.linear.bias
transformer.blocks.1.transition.0.linear_gate.weight
transformer.blocks.1.transition.0.linear_gate.bias
transformer.blocks.1.transition.2.weight
transformer.blocks.1.transition.2.bias
transformer.blocks.1.out_layer_norm.weight

My question is, how can i fine-tune the query, key, and value matrices when they are wrapped in mh_attn? If i inspect the model, the layers i want to fine tune with lora is in “flash_self_attn” but they are not listed there?

Mayank_Kumar · September 8, 2024, 9:46pm

Wow , weird. Can you try printing all parameters regardless of their requires_grad param ? To see if “flash_self_attn” is even their.