Is this an error?

model.train()
for epoch in range(num_epoch):
    loss_hist.reset()
    
    for i, (images, targets, ImageIDs) in enumerate(train_loader):

        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        loss_dict = model(images, targets)
        
        losses = sum(loss for loss in loss_dict.values())
        loss_value = losses.item()

        loss_hist.send(loss_value)

        optimizer.zero_grad()
        losses.backward()
        optimizer.step()

        if itr % 500 == 0:
            print(f"Iteration #{itr} loss: {loss_value}")

        itr += 1
    
    if lr_scheduler is not None:
        lr_scheduler.step()

    print(f"Epoch #{epoch} loss: {loss_hist.value}")   

Output exceeds the size limit. Open the full output data in a text editor

22/06/22 15:00:54 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.

This warning:

22/06/22 15:00:54 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.

seems to be raised by Spark and is thus unrelated to PyTorch or are you concerned about another issue in your code?

1 Like

I’m worried that the code is learning properly.

I cannot see any obvious issues in the posted code.

1 Like

Thank you. I was worried that my learning wouldn’t end at all.