Hello everyone!
Why losses combined as
loss_dict = model(images, targets) losses = sum(loss for loss in loss_dict.values()) optimizer.zero_grad() losses.backward() but never as
losses = sum(loss_dict.values()) ? Is it the same ?
You have very specific model that returns losses instead of predictions and it also returns them as a dictionary. Usually models return predictions, then loss functions return loss(es) as tensor(s) and you can access it’s value via .item()
Thank you for answer!
Like we can see here vision/engine.py at 59ec1dfd550652a493cb99d5704dcddae832a204 · pytorch/vision · GitHub all models in torchvision in train state return dictionary of losses (of course in state eval they return prediction). I think it is common way because it incapsulate losses in model that very convenient.
But anyway, when we combine losses it has a lot of subtleties because we combine graphs with gradient not just values. Here more details python - How can i process multi loss in pytorch? - Stack Overflow
I guess that exist reason related to autograd or something else why in all place losses combinated via python сomprehension sum(loss for loss in loss_dict_reduced.values()) not just sum(loss_dict_reduced.values())
One man told me that it related with torchscript that it not supports sum of dictionary values by direct. But he not sure.
Ah, I see. I think this is because dict.values() returned as itervalues iterator, and [loss for loss in itervalues()] is converting iterator to a list.
Yes, that is truth. [loss for loss in itervalues()] is converting iterator to a list. But question was why we should convert iterator to a list and than sum it sum(loss for loss in loss_dict.values())? Why we cannot sum iterator directly sum(loss_dict.values()) ? What is different ?
Right. That code was written 3 years ago, I guess torch.sum was not able to sum over iterator in early 2019.
Maybe.
But in code like this
with torch.cuda.amp.autocast(enabled=scaler is not None):
loss_dict = model(images, targets)
losses = sum(loss for loss in loss_dict.values())
# reduce losses over all GPUs for logging purposes
loss_dict_reduced = utils.reduce_dict(loss_dict)
losses_reduced = sum(loss for loss in loss_dict_reduced.values())
loss_value = losses_reduced.item()
or this
for i, data in enumerate(train_dataloader):
optimizer.zero_grad()
images, targets = data[0], data[1]
images = list(image.to(device) for image in images)
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
loss_dict = model(images, targets)
loss = sum(loss for loss in loss_dict.values())
running_loss += loss.item()
loss.backward()
optimizer.step()
we cannot use torch.sum at all because it sum tensor whereas loss_dict.values() is list.
If we use google we see a lot of examples where used variant loss = sum(loss for loss in loss_dict.values())
and never variant sum(loss_dict.values())
. loss_dict.values pytorch - Google Search
I found another variant here https://isrc.iscas.ac.cn/gitlab/research/domain-adaption/-/commit/7fb5ee486e519de46175a44f6f67455334d4a549 but they do conversion too
loss_dict, _ = model(images, img_metas, targets)
losses = sum(list(loss_dict.values()))
loss_dict_reduced = dist_utils.reduce_dict(loss_dict)
Maybe it only code style, maybe if we will use in our project variant without conversion sum(loss_dict.values())
it will not lead to unwanted effects (like errors) but I not sure
Right this the same. In this repository a lot of example Search · loss_dict · GitHub