Full finetune, LoRA and feature extraction take the same amount of memory and time to train

I need to compare full finetune and LoRA on T5 model on summarization task. The problem is that on my GPU finetune takes ~4.5gb of VRAM and 1 hour to train. And LoRA takes the same amount of memory and time with the same parameters. Sometimes it takes even 100-200mb more according to pytorch. Am I missing something? Do I need to do it different way?
Here’s the snippet of my code. I used to have my own training loop but then opted for Trainer class, though it made no difference.
Finetune setup:

tokenizer = AutoTokenizer.from_pretrained('t5-base')
model = AutoModelForSeq2SeqLM.from_pretrained('t5-base').to(device)
train_args = TrainingArguments(
    gradient_accumulation_steps=2, 
    gradient_checkpointing=True,
    learning_rate=lr,
    num_train_epochs=epochs,
    warmup_steps=warmup,
    optim='adafactor',
    )
trainer = Trainer(model, train_args, train_dataset=train_dataset, eval_dataset=eval_dataset)
trainer.train()

LoRA setup:

config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q", "v"],
    lora_dropout=0.1,
    bias="none",
)
tokenizer = AutoTokenizer.from_pretrained('t5-base')
model = AutoModelForSeq2SeqLM.from_pretrained('t5-base').to(device)
model = get_peft_model(model, config)
train_args = TrainingArguments(
    gradient_accumulation_steps=2, 
    gradient_checkpointing=True,
    learning_rate=lr,
    num_train_epochs=epochs,
    warmup_steps=warmup,
    optim='adafactor',
    )
trainer = Trainer(model, train_args, train_dataset=train_dataset, eval_dataset=eval_dataset)
trainer.train()

Out of curiosity I tried to freeze all weights except in lm_head (~11%) with require_grad = False and no difference with finetune.