Cross Entropy Loss and loss of PyTorch Lightning does not matches

Ayush_Singh · August 27, 2021, 4:21am

Hi, I am working on building a question and answering model using T5(Huggingface) with PyTorch Lightning Module and I am checking my loss and PyTorch Lightning Loss is not being matched.

class UQAFineTuneModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.model = T5ForConditionalGeneration.from_pretrained(
            "allenai/unifiedqa-t5-small", return_dict=True
        )
        self.model.train()
    def forward(
        self,
        source_text_input_ids,
        source_text_attention_mask,
        target_text_input_ids=None,
    ):
        output = self.model(
            input_ids=source_text_input_ids,
            attention_mask=source_text_attention_mask,
            labels=target_text_input_ids,
        )
        return output.loss, output.logits

    def training_step(self, batch, batch_idx):
        source_text_input_ids = batch["source_text_input_ids"]
        source_text_attention_mask = batch["source_text_attention_mask"]
        target_text_input_ids = batch["target_text_input_ids"]
        # labels_attention_mask = batch["target_text_attention_mask"]
        loss, outputs = self(
            source_text_input_ids, source_text_attention_mask, target_text_input_ids
        ) 
        loss_mine = None  
        output = self.model(
            input_ids=source_text_input_ids,
            attention_mask=source_text_attention_mask,
            labels=target_text_input_ids,
        ) 
        labels = batch["target_text_input_ids"].clone() 
        labels[labels == 0] = -100 
        if target_text_input_ids is not None:  
            loss_fct = CrossEntropyLoss(ignore_index=-100) 
            loss_mine = loss_fct(output.logits.view(-1, outputs.size(-1)), labels.view(-1)) 
            print(f"loss_Hugginface: {loss.item()}, loss_mine : {loss_mine.item()}") 
        self.log("train_loss", loss, prog_bar=True, logger=True)
        return {"loss": loss, "predictions": outputs}

You can see the above image, why loss is not same, help is very much needed, I asked the same question on HuggingFace but they told me to ask here, you can view that discussion here.

gphilip · August 27, 2021, 4:56am

(I am pretty much ignorant about Lightning, so my comment may not make sense.)

Could you show the code for the training loop, where you do the actual training?

Ayush_Singh · August 27, 2021, 6:12am

Yaa Sure,

pl_model = UQAFineTuneModel()
trainer = pl.Trainer(

        max_epochs=args.model_config["num_epochs"],

        default_root_dir=Path(args.exp_directory),

        logger=args.lgr,

        callbacks=[early_stopping_callback, checkpoint_callback],

        gpus=torch.cuda.device_count() if args.torch_device == "gpu" else None,

        auto_select_gpus=True if args.torch_device == "gpu" else False,

    )

    trainer.fit(

        pl_model, train_dataloader=train_data_loader, val_dataloaders=val_data_loader

    )