GPU RAM out of memory

Hey guys, i’m trying to finetune a sharded llama 2 model for a college project, but i keep running out of GPU RAM instantly

This code is based on a template i found online…

I tried setting max_split_size_mb based on the error message but that doesnt seem to be helping either

from transformers import TrainingArguments

output_dir = "./results"
per_device_train_batch_size = 4
gradient_accumulation_steps = 4
optim = "paged_adamw_32bit"
save_steps = 100
logging_steps = 10
learning_rate = 2e-4
max_grad_norm = 0.3
max_steps = 100
warmup_ratio = 0.03
lr_scheduler_type = "constant"

training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
)

from trl import SFTTrainer

max_seq_length = 512

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
)

Then I’m just simply calling trainer.train()

I’m using Google Colab free with T4 GPU, i don’t want to upgrade to the pro version considering I’m a cash-strapped college student lmao and i lack the experience to know how much that would even help in the first place

Any suggestions on how to properly manage resources are appreciated

Here’s the error message for reference-

OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 51.06 MiB is free. Process 11288 has 14.70 GiB memory in use. Of the allocated memory 14.27 GiB is allocated by PyTorch, and 304.69 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Do you know how much memory the model would need during training? 15GB might not be enough.

I don’t really know… it was spiking up to 14.7gb almost instantly…