I don't know the object detection error

NagaYu · April 3, 2022, 6:34am

I haven’t been able to learn. I don’t know what’s wrong because no error occurs.

model.train()
for epoch in range(num_epoch):
    loss_hist.reset()
    
    for i, (images, targets, ImageIDs) in enumerate(train_loader):

        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        loss_dict = model(images, targets)
        
        losses = sum(loss for loss in loss_dict.values())
        loss_value = losses.item()
        
        loss_hist.send(loss_value)
        
        optimizer.zero_grad()
        losses.backward()
        optimizer.step()

        if i%100 == 0:  
            print("epoch:", epoch,  "iteration:", i,  "loss:", losses.item())

/Users/nagaoyuuta/opt/anaconda3/lib/python3.9/site-packages/torch/nn/functional.py:780: UserWarning: Note that order of the arguments: ceil_mode and return_indices will changeto match the args list in nn.MaxPool2d in a future release.
warnings.warn(“Note that order of the arguments: ceil_mode and return_indices will change”

epoch: 0 iteration: 0 loss: 66.70726178733568

Andrei_Cristea · April 3, 2022, 5:23pm

NagaYu - is that your entire stdout? It looks like you’re running your model for a small number of steps (< 100) since you’re only printing out the loss once.

NagaYu · April 4, 2022, 7:58am

Yes. The output layer. Do the loss functions need to be the same?

Andrei_Cristea · April 4, 2022, 1:25pm

It seems like you are training your model for too few steps. You need to train it for more steps before the training loss converges. Your current example only includes output from the very first model run, when it’s totally untrained.

NagaYu · April 5, 2022, 8:41am

Need to step over 100 k?

Andrei_Cristea · April 5, 2022, 3:10pm

If this is the only output you get from your script:

epoch: 0 iteration: 0 loss: 66.70726178733568

Then it means you only ran your training for at most 100 steps, since otherwise this condition would have triggered again and you would have gotten another line printed out:

if i%100 == 0:  
    print("epoch:", epoch,  "iteration:", i,  "loss:", losses.item())

Maybe this is not the problem and you truncated your output before posting it here, but if you didn’t truncate then you should double-check that your train_loader has as much data in it as you think, and that your num_epochs are high enough. If you did truncate, please post more of the output (meaning, losses for more iterations and more epochs than just epoch=0, iteration=0).

Put differently, what you posted does not show whether or not your model is training.

NagaYu · April 6, 2022, 3:41pm

I understand. Check where to check. Thank you for your polite explanation.