AutoModelForCausalLM dataset process

hbao4148 · November 9, 2024, 3:19pm

Essentially, I don’t understand how to train an autoregressive model, what the label should be, and what the loss function is. I found that the loss function hardly decreased during training, and I think it may be that the way I handle the label is wrong. Here is my code.

def preprocess_function(examples):
model_inputs = tokenizer(
examples[“text”],
max_length=context_length,
truncation=True,
padding=True,
return_tensors=‘pt’
)
labels = tokenizer(
examples[“label”], max_length=context_length, truncation=True,padding=True,return_tensors=‘pt’
)
model_inputs[“labels”] = labels[“input_ids”]
return model_inputs

tokenized_train = dataset_train.map(
preprocess_function,
batched=True,
num_proc=4,
remove_columns=dataset_train.column_names,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_train,
eval_dataset=tokenized_val,
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics,
# preprocess_logits_for_metrics = preprocess_logits_for_metrics
)
I renamed the answer of the gsm8k dataset to label and the question to text

Rajatavaa · November 9, 2024, 4:51pm

This is very unclear of what your hyper-parameters are