Having troubel in understanding what loss is currently in use

I was going through this hugging face code and I am having trouble understanding what loss the model is currently using . Although I know most seq2seq models uses CrossEntrophy loss but I don’t see the definition anywhere in the code

huggingface/transformers/blob/aca6288ff42cebded5421020f0ff088adeb446dd/examples/language-modeling/run_clm.py

Actually I want to train the model with a new custom loss. I have trained a baseline model and its working fine.

Thank You

The transformers library seems to wire the loss into the model’s forward, e.g. Llama it is CrossEntropyLoss on the next token, indeed.

I am not sure whether I can recommend following that style in modelling.

For easier to understand implementations, A. Karpathy’s NanoGPT or Lightning.ai’s LitGPT (disclosure: I do some work for them) might be good choices.

Best regards

Thomas

Thanks for the quick reply, actually currently I am using pre-trained llama models from hugging face. I want to fine-tune llama with a weighted loss function. Any idea how I can integrate it in the transformer library? I found some links related to that but it does not seem to working.

You could subclass the model and reimplement the forward with your modification or so.

However, I do think that fundamentally, transformers is a library targeted towards people using the models as-is. The other two repos that I linked (and A. Karpathy’s earlier MinGPT both have as a reference point) deliberately want do differently: there, it is intended that you take and modify the code rather than using them just as a library.

Best regards

Thomas