Parameter-Efficient Fine-Tuning for CNN

vivekstorm · January 19, 2024, 4:04am

Dear community,

Is there some Parameter Efficient Fine-Tuning technique like LORA(low rank adaptation) is available for CNN to reduce the GPU memory usage while training/fine-tuning the network?
Is it possible to apply LoRA for CNN kernels which will be of (3X3, 5X5, 7X7)?

Any article recommendations will be highly helpful. Thanks

minhnh · January 19, 2024, 5:58am

It seem like Microsoft has written LoRAConv layer, you can find it here.

Raman_Dutt · February 5, 2024, 11:30am

I implemeted a ton of PEFT methods for both CNNs and ViTs. Check my paper here - [2305.08252] Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity

Mees.elizabeth · May 17, 2024, 10:19pm

Do you have your code somewhere? I’m particularly interested in trying SSF, but it performs super bad so I must be doing something wrong

Raman_Dutt · May 17, 2024, 10:47pm

The official code is here - GitHub - dongzelian/SSF: [NeurIPS'22] This is an official implementation for "Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning".

Mees.elizabeth · May 17, 2024, 10:54pm

Thank you for your response! I read in your paper that you’ve added SSF to a ResNet50, did you add it to only the ResBlocks? And throughout the entire model? I tried to add it to the resblocks of a segresnet like
` scale, shift = init_ssf_scale_shift(in_channels)
self.ssf_scale = scale
self.ssf_shift = shift

def forward(self, x):
    
    if self.tuning_mode == "ssf":

        identity = x
        
        x = self.norm1(x)
        x = self.act(x)
        x = ssf_ada(self.conv1(x), self.ssf_scale, self.ssf_shift)

        x = self.norm2(x)
        x = self.act(x)
        x = ssf_ada(self.conv2(x), self.ssf_scale, self.ssf_shift)

        x += identity`.

Then I only train the params with ‘ssf’ in their name, and this gives very poor results.

MyCenturaHealth · May 22, 2024, 11:24am

Yes, LoRA can be adapted for CNNs to reduce GPU memory usage. Check “LoRA: Low-Rank Adaptation of Large Language Models” for foundational concepts.