Why losses combined as loss_dict = model(images, targets) losses = sum(loss for loss in loss_dict.values()) optimizer.zero_grad() losses.backward()

Hello everyone!
Why losses combined as
loss_dict = model(images, targets) losses = sum(loss for loss in loss_dict.values()) optimizer.zero_grad() losses.backward() but never as
losses = sum(loss_dict.values()) ? Is it the same ?

You have very specific model that returns losses instead of predictions and it also returns them as a dictionary. Usually models return predictions, then loss functions return loss(es) as tensor(s) and you can access it’s value via .item()

Thank you for answer!

Like we can see here vision/engine.py at 59ec1dfd550652a493cb99d5704dcddae832a204 · pytorch/vision · GitHub all models in torchvision in train state return dictionary of losses (of course in state eval they return prediction). I think it is common way because it incapsulate losses in model that very convenient.

But anyway, when we combine losses it has a lot of subtleties because we combine graphs with gradient not just values. Here more details python - How can i process multi loss in pytorch? - Stack Overflow

I guess that exist reason related to autograd or something else why in all place losses combinated via python сomprehension sum(loss for loss in loss_dict_reduced.values()) not just sum(loss_dict_reduced.values())

One man told me that it related with torchscript that it not supports sum of dictionary values by direct. But he not sure.

Ah, I see. I think this is because dict.values() returned as itervalues iterator, and [loss for loss in itervalues()] is converting iterator to a list.

Yes, that is truth. [loss for loss in itervalues()] is converting iterator to a list. But question was why we should convert iterator to a list and than sum it sum(loss for loss in loss_dict.values())? Why we cannot sum iterator directly sum(loss_dict.values()) ? What is different ?

Right. That code was written 3 years ago, I guess torch.sum was not able to sum over iterator in early 2019.

Maybe.
But in code like this

  with torch.cuda.amp.autocast(enabled=scaler is not None):
            loss_dict = model(images, targets)
            losses = sum(loss for loss in loss_dict.values())

        # reduce losses over all GPUs for logging purposes
        loss_dict_reduced = utils.reduce_dict(loss_dict)
        losses_reduced = sum(loss for loss in loss_dict_reduced.values())

        loss_value = losses_reduced.item()

or this

  for i, data in enumerate(train_dataloader):
        optimizer.zero_grad()
        images, targets = data[0], data[1]
        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
        loss_dict = model(images, targets)
        loss = sum(loss for loss in loss_dict.values())
        running_loss += loss.item()
        loss.backward()
        optimizer.step()

we cannot use torch.sum at all because it sum tensor whereas loss_dict.values() is list.

If we use google we see a lot of examples where used variant loss = sum(loss for loss in loss_dict.values()) and never variant sum(loss_dict.values()). loss_dict.values pytorch - Google Search

I found another variant here https://isrc.iscas.ac.cn/gitlab/research/domain-adaption/-/commit/7fb5ee486e519de46175a44f6f67455334d4a549 but they do conversion too

        loss_dict, _ = model(images, img_metas, targets)
        losses = sum(list(loss_dict.values()))

        loss_dict_reduced = dist_utils.reduce_dict(loss_dict)

Maybe it only code style, maybe if we will use in our project variant without conversion sum(loss_dict.values()) it will not lead to unwanted effects (like errors) but I not sure

Right this the same. In this repository a lot of example Search · loss_dict · GitHub