I’m using a Faster RCNN, in order to train a net on custom COCO dataset, using Multinode and multigpu configuration. While the training seems works well, I have some trouble using validation. I’m in multilabel object detection situation, so having multiple bounding box with different labels in the same image.
for epoch in range(num_epochs):
model.train()
total_loss = 0
for batch_idx, (batch_images, batch_targets) in enumerate(train_dataloader):
images = list(image.to(local_rank) for image in batch_images) # Sposta le immagini sulla GPU
# Sposta le annotazioni sulla GPU
targets = []
for batch_target in batch_targets:
boxes = [torch.tensor(bbox['bbox']).to(local_rank) for bbox in batch_target]
labels = [torch.tensor(bbox['category_id']).to(local_rank) for bbox in batch_target]
target = {
'boxes': torch.stack(boxes),
'labels': torch.stack(labels)
}
targets.append(target)
optimizer.zero_grad()
loss_dict = model(images, targets)
loss = sum(loss for loss in loss_dict.values())
loss.backward()
optimizer.step()
total_loss += loss.item()
if batch_idx % 10 == 0 and local_rank == 0:
print(f"Epoch [{epoch+1}/{num_epochs}] Batch [{batch_idx}/{len(train_dataloader)}] Loss: {loss.item()}")
model.eval()
with torch.no_grad():
total_val_loss = 0
for val_batch_idx, (val_batch_images, val_batch_targets) in enumerate(val_dataloader):
val_images = list(image.to(local_rank) for image in val_batch_images) # Sposta le immagini sulla GPU
# Sposta le annotazioni sulla GPU
val_targets = []
for batch_target in val_batch_targets:
boxes = [torch.tensor(bbox['bbox']).to(local_rank) for bbox in batch_target]
labels = [torch.tensor(bbox['category_id']).to(local_rank) for bbox in batch_target]
target = {
'boxes': torch.stack(boxes),
'labels': torch.stack(labels)
}
val_targets.append(target)
loss_dict = model(val_images, val_targets)
losses = sum(loss for loss in loss_dict.values())
loss_value = losses.item()
print("Loss value: ", loss_value)
I get the following error:
losses = sum(loss for loss in loss_dict.values()) AttributeError: 'list' object has no attribute 'values'
The error is trivial itself, but I don’t understand why it appears. I saw a lot of example set the validation in such manner, why in my case does not works? It seems to be a list of dictionary. Why it returns a list of dictionary? In training mode model() return a dictionary. How I have to manage it in order to retrieve validation loss? Thanks.