My model even can't overfit two image and the loss do not decrease!

ALanLee · August 15, 2017, 12:52pm

Epoch [1/5], Iter [1/51200] TotalLoss: 14.3430 Loss1: 3.4093 Loss2: 3.7797 Loss3: 3.4102 Loss3: 3.7438
Epoch [1/5], Iter [2/51200] TotalLoss: 9.1436 Loss1: 2.0375 Loss2: 2.9503 Loss3: 2.7624 Loss3: 1.3935
Epoch [1/5], Iter [3/51200] TotalLoss: 12.7019 Loss1: 3.1669 Loss2: 3.1706 Loss3: 3.1461 Loss3: 3.2183
Epoch [1/5], Iter [4/51200] TotalLoss: 199.0285 Loss1: 45.2067 Loss2: 60.3769 Loss3: 52.5529 Loss3: 40.8920
Epoch [1/5], Iter [5/51200] TotalLoss: 13.5216 Loss1: 3.3813 Loss2: 3.3809 Loss3: 3.3761 Loss3: 3.3833
Epoch [1/5], Iter [6/51200] TotalLoss: 12.3115 Loss1: 3.0906 Loss2: 3.0759 Loss3: 3.0606 Loss3: 3.0844
Epoch [1/5], Iter [7/51200] TotalLoss: 9.6911 Loss1: 2.4250 Loss2: 2.0567 Loss3: 2.1717 Loss3: 3.0377
Epoch [1/5], Iter [8/51200] TotalLoss: 11.4812 Loss1: 2.8772 Loss2: 2.8731 Loss3: 2.8619 Loss3: 2.8690
Epoch [1/5], Iter [9/51200] TotalLoss: 7.7518 Loss1: 1.9965 Loss2: 1.8900 Loss3: 1.9414 Loss3: 1.9240
Epoch [1/5], Iter [10/51200] TotalLoss: 5.8163 Loss1: 1.3725 Loss2: 1.4375 Loss3: 1.5680 Loss3: 1.4383
Epoch [1/5], Iter [11/51200] TotalLoss: 5.2633 Loss1: 1.3306 Loss2: 1.3322 Loss3: 1.2640 Loss3: 1.3365
Epoch [1/5], Iter [12/51200] TotalLoss: 3.2123 Loss1: 0.8192 Loss2: 0.7681 Loss3: 0.7401 Loss3: 0.8849
Epoch [1/5], Iter [13/51200] TotalLoss: 2.6675 Loss1: 0.6967 Loss2: 0.6703 Loss3: 0.6426 Loss3: 0.6579
Epoch [1/5], Iter [14/51200] TotalLoss: 2.8774 Loss1: 0.7332 Loss2: 0.7385 Loss3: 0.6811 Loss3: 0.7245
Epoch [1/5], Iter [15/51200] TotalLoss: 2.8546 Loss1: 0.7366 Loss2: 0.7264 Loss3: 0.7202 Loss3: 0.6713
Epoch [1/5], Iter [16/51200] TotalLoss: 2.7758 Loss1: 0.6806 Loss2: 0.6549 Loss3: 0.7345 Loss3: 0.7058
Epoch [1/5], Iter [17/51200] TotalLoss: 2.8139 Loss1: 0.7092 Loss2: 0.6574 Loss3: 0.7800 Loss3: 0.6674
Epoch [1/5], Iter [18/51200] TotalLoss: 2.7282 Loss1: 0.6225 Loss2: 0.6997 Loss3: 0.7237 Loss3: 0.6823
Epoch [1/5], Iter [19/51200] TotalLoss: 2.8259 Loss1: 0.7428 Loss2: 0.7204 Loss3: 0.6845 Loss3: 0.6783
Epoch [1/5], Iter [20/51200] TotalLoss: 2.5689 Loss1: 0.6701 Loss2: 0.6525 Loss3: 0.6837 Loss3: 0.5626
Epoch [1/5], Iter [21/51200] TotalLoss: 2.7335 Loss1: 0.6944 Loss2: 0.7051 Loss3: 0.6600 Loss3: 0.6740
Epoch [1/5], Iter [22/51200] TotalLoss: 2.8239 Loss1: 0.6977 Loss2: 0.7478 Loss3: 0.6584 Loss3: 0.7199
Epoch [1/5], Iter [23/51200] TotalLoss: 2.7279 Loss1: 0.7237 Loss2: 0.6032 Loss3: 0.6741 Loss3: 0.7270
Epoch [1/5], Iter [24/51200] TotalLoss: 2.6124 Loss1: 0.6750 Loss2: 0.6018 Loss3: 0.6210 Loss3: 0.7146
Epoch [1/5], Iter [25/51200] TotalLoss: 2.9644 Loss1: 0.7071 Loss2: 0.6792 Loss3: 0.7112 Loss3: 0.8669
Epoch [1/5], Iter [26/51200] TotalLoss: 3.0921 Loss1: 0.6923 Loss2: 0.7369 Loss3: 0.6726 Loss3: 0.9902
Epoch [1/5], Iter [27/51200] TotalLoss: 3.1947 Loss1: 0.7637 Loss2: 0.8234 Loss3: 0.7893 Loss3: 0.8183
Epoch [1/5], Iter [28/51200] TotalLoss: 2.9257 Loss1: 0.6981 Loss2: 0.7647 Loss3: 0.6602 Loss3: 0.8028
my network is vgg like artitecture and the task is to recognize captcha
it’s a multi-label task by using cnn.
but it didn’t work, it cannot even fit two image by training a lot of steps…

Alessandro_De_Micco · August 20, 2017, 7:41am

How is defined your Loss function?

spacemeerkat · May 30, 2018, 12:40pm

I have a similar problem where I have a simple model which won’t overfit despite not using regularising techniques like dropout.

My model looks as follows:

Net(
  (feature_extractor): Sequential(
    (0): Conv2d(1, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): ReLU()
    (2): Conv2d(64, 128, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (3): ReLU()
    (4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(128, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (7): ReLU()
    (8): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Linear(in_features=65536, out_features=256, bias=True)
    (1): ReLU()
    (2): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Linear(in_features=256, out_features=1, bias=True)
  )
)

and my training loop looks as follows:

lr = np.linspace(0.01,1./(1000**2),25)

torch.backends.cudnn.benchmark = True
torch.backends.cudnn.fastest = True

start = time.time()
model.train()
train_loss = []
train_accu = []
i = 0
for epoch in range(25):
    optimizer = optim.Adam(model.parameters(), lr=lr[epoch])
    for image, label in train_loader:
        data, target = image.float().cuda(), label.float().cuda()
        optimizer.zero_grad()
        output = model(data)
        loss = F.mse_loss(output, target.view(output.size()))  
        loss.backward()    # calc gradients
        train_loss.append(loss.data[0]) # Calculating the loss
        optimizer.step()   # update gradients
        prediction = torch.floor(output).int()  

        accuracy = (prediction.eq(target.view(output.size()).int()).sum().float()/batch_size)*100
        train_accu.append(accuracy)
        if i % 10 == 0:
            print('Epoch:',str(epoch),'Train Step: {}\tLoss: {:.3f}\tAccuracy: {:.3f}'.format(i, loss.data[0], accuracy))
            #print('TARGET:',target)
            #print('OUTPUT:',output.view(target.size()))
        i += 1
end = time.time()
print('TRAIN TIME:')
print('%.2gs'%(end-start))

I would have thought that after 10 iterations the model would have at least started to increase in accuracy towards overfitting but so far I’m just getting a bouncing of accuracy below ~30%.

Loss:
loss
Accuracy:
accuracy2

TL,DR:

mse_loss
Adam optimizer
Learning rate decay
No dropout (but do include batch norm)
Not overfitting after 10 epochs.

Thanks in advance for any guidance on this issue!

ptrblck · May 30, 2018, 1:08pm

Could you try to change the learning rate using one optimizer and changing its param_groups[0]['lr'] or alternatively torch.optim.lr_scheduler?
Currently you are re-initializing it in each epoch, thus your optimizer will lose its accumulated momentum etc.
Maybe it won’t have a big effect, but it’s worth a first shot!

spacemeerkat · May 30, 2018, 1:19pm

Thanks for the reply, I’m now using:

optimizer = optim.Adam(model.parameters(), lr=0.01)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)

and for training I simply include the line:

scheduler.step()

under the for epoch in range(25): loop.

I will update once it has undergone all it’s iterations…

UPDATE:

That hasn’t had an affect on improving training accuracy unfortunately, but thanks for pointed out that bug

Sherlock_c787 · April 1, 2019, 8:30am

Have you solved the problem? I meet the same problem, and i tried to change the network structure, learning rate and training strategy, it still dosen’t work.
my loss function is shown in the following picture:

Thanks in advance!

Deeply · July 2, 2019, 1:19pm

I could be wrong, but I do not think using the term ‘overfit’ is legit, both in your question and the post.