Why transformer DataCollatorForSeq2Seq() Takes model as a parameter?

sandeep1 · July 24, 2022, 2:28pm

Hi,

For traning of any seq2seq model say Bart :-

from transformers import DataCollatorForSeq2Seq
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

trainer = Seq2SeqTrainer(
    model,
    args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

Why DataCollatorForSeq2Seq Takes model as parameter when it is also provided to Seq2SeqTrainer.
What exactly it does with the tokenizer and model as input?

I appreciate your help.

Thank You.

ptrblck · July 24, 2022, 7:32pm

From the docs:

model (PreTrainedModel) — The model that is being trained. If set and has the prepare_decoder_input_ids_from_labels, use it to prepare the decoder_input_ids
This is useful when using label_smoothing to avoid calculating loss twice.

I would assume it’s an optimization and apparently avoids some calculations.
The HuggingFace discussion board might be a better place for these HF-specific questions as you would find more experts there.