Hi, I am a beginner. I want to fine-tune a model using LoRA with the peft package, it’s this model: GitHub - lbcb-sci/RiNALMo: RiboNucleic Acid (RNA) Language Model. However, since only torch.nn.Linear and Conv1D are supported, I cannot fine-tune it and select the layers I want as the linear layers for, say, query key and value matrices are wrapped in another module. If I show the named modules, layers, they are wrapped basically. Can anyone help to circumvent this? Is there a way to access the linear layers wrapped in a bigger module?
what do you mean by they are wrapped in another module ?
Like in Huggingface transformers, query matrix is at:
model.vit.encoder.layer[0].attention.attention.query
Is this what you need ?
Thanks for the response! So if you do the following for this rinalmo model:
for name, param in model.named_parameters():
if param.requires_grad:
print(name)
You will get something like this:
embedding.weight
transformer.blocks.0.mh_attn.Wqkv.weight
transformer.blocks.0.mh_attn.out_proj.weight
transformer.blocks.0.attn_layer_norm.weight
transformer.blocks.0.attn_layer_norm.bias
transformer.blocks.0.transition.0.beta
transformer.blocks.0.transition.0.linear.weight
transformer.blocks.0.transition.0.linear.bias
transformer.blocks.0.transition.0.linear_gate.weight
transformer.blocks.0.transition.0.linear_gate.bias
transformer.blocks.0.transition.2.weight
transformer.blocks.0.transition.2.bias
transformer.blocks.0.out_layer_norm.weight
transformer.blocks.0.out_layer_norm.bias
transformer.blocks.1.mh_attn.Wqkv.weight
transformer.blocks.1.mh_attn.out_proj.weight
transformer.blocks.1.attn_layer_norm.weight
transformer.blocks.1.attn_layer_norm.bias
transformer.blocks.1.transition.0.beta
transformer.blocks.1.transition.0.linear.weight
transformer.blocks.1.transition.0.linear.bias
transformer.blocks.1.transition.0.linear_gate.weight
transformer.blocks.1.transition.0.linear_gate.bias
transformer.blocks.1.transition.2.weight
transformer.blocks.1.transition.2.bias
transformer.blocks.1.out_layer_norm.weight
My question is, how can i fine-tune the query, key, and value matrices when they are wrapped in mh_attn? If i inspect the model, the layers i want to fine tune with lora is in “flash_self_attn” but they are not listed there?
Wow , weird. Can you try printing all parameters regardless of their requires_grad param ? To see if “flash_self_attn” is even their.