Is this an error?

NagaYu · June 22, 2022, 6:28am

model.train()
for epoch in range(num_epoch):
    loss_hist.reset()
    
    for i, (images, targets, ImageIDs) in enumerate(train_loader):

        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        loss_dict = model(images, targets)
        
        losses = sum(loss for loss in loss_dict.values())
        loss_value = losses.item()

        loss_hist.send(loss_value)

        optimizer.zero_grad()
        losses.backward()
        optimizer.step()

        if itr % 500 == 0:
            print(f"Iteration #{itr} loss: {loss_value}")

        itr += 1
    
    if lr_scheduler is not None:
        lr_scheduler.step()

    print(f"Epoch #{epoch} loss: {loss_hist.value}")

Output exceeds the size limit. Open the full output data in a text editor

22/06/22 15:00:54 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.

ptrblck · June 22, 2022, 6:32am

This warning:

22/06/22 15:00:54 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.

seems to be raised by Spark and is thus unrelated to PyTorch or are you concerned about another issue in your code?

NagaYu · June 22, 2022, 7:52am

I’m worried that the code is learning properly.

ptrblck · June 22, 2022, 3:33pm

I cannot see any obvious issues in the posted code.

NagaYu · June 22, 2022, 4:15pm

Thank you. I was worried that my learning wouldn’t end at all.