Help CUDA error: out of memory

I don’t know what pinokio is, but note that PyTorch binaries ship with their own CUDA runtime dependencies and your locally installed CUDA toolkit will be used if you build PyTorch from source or a custom CUDA extension. Did you build PyTorch from source? If not, the newly installed CUDA toolkit would be irrelevant.

import torch
import torch.nn as nn
from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM
torch.cuda.empty_cache()

Load pre-trained model and tokenizer

model_name = “meta-llama/Llama-2-7b-chat-hf”
model = AutoModelForCausalLM.from_pretrained(model_name,cache_dir = “./model”)
tokenizer = AutoTokenizer.from_pretrained(model_name,cache_dir = “./model”)

Move model to appropriate device

device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)
print(device)
def get_model_response(input_text):
# Tokenize the input text
input_ids = tokenizer.encode(input_text, return_tensors=“pt”).to(device)

# Generate response from the model
output = model.generate(input_ids, max_new_tokens = 100, num_return_sequences=1,do_sample=True, temperature=0.001)

# Decode the generated response
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

return generated_text

Example usage:

user_input = “What is the capital of India ?”
response = get_model_response(user_input)
print(“Model:”, response)

Can any one please help me ? I am getting following error, I have i5 13th gen, rtx 4050 6gb vram with 32 gb ddr5 ram. still getting this error, i have successfully downloaded the model in local machine.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacity of 6.00 GiB of which 0 bytes is free. Of the allocated memory 20.45 GiB is allocated by PyTorch, and 13.17 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (CUDA semantics — PyTorch 2.2 documentation)

please help me, Thanks :slight_smile: Happy Coding

hi patrick
how we can incrementally train our model.
initially i downloaded bert-uncased model and fine tuned it with some data.
saved model in my machine , again when i load this model to train on new data chuncks
i got this error message :
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

if you can guide
thanks

Did you change the batch size of anything else in the model? Since the initial model was working I would also assume loading the fine-tunes one should work.

in initial model and in later incremental training both time
per_device_train_batch_size=8

1st model training :
model = BertForSequenceClassification.from_pretrained(‘bert-base-uncased’,num_labels=2)
saved this model as custom_model and loaded this as second model

2nd model training:

model2 = BertForSequenceClassification.from_pretrained(“custom_model”)

while trainer initialization i got this error
trainer = Trainer(
model=model2,
args=args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
compute_metrics=compute_metrics
)

error :

/home/ibex/anaconda3/envs/NLP/lib/python3.10/site-packages/accelerate/accelerator.py:432: FutureWarning: Passing the following arguments to Accelerator is deprecated and will be removed in version 1.0 of Accelerate: dict_keys([‘dispatch_batches’, ‘split_batches’, ‘even_batches’, ‘use_seedable_sampler’]). Please pass an accelerate.DataLoaderConfiguration instead: dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True) warnings.warn(