Model loss increases after learning

OkunElya · August 11, 2020, 7:31pm

So I’m trying to learn model to drive a robot
there is rgb image input
and 4 outputs (turn,throttle,m1,m2)
batch size is always 1,if this is problem pls help to solve (
here is some loss information
I think normal loss for this model is 0.05-0.1
lr=0.025
{
Epoch:0 , loass:7737375380554.846
Epoch:1 , loass:7737643846993.172
at ep 7 loss was 2.5
}
Lr is to high lets try smaller one!
lr=0.001
{
Epoch:0 , loass:72.43417893535843
Epoch:1 , loass:72.60916486998268
Epoch:2 , loass:72.69703222041555
Epoch:3 , loass:72.75141251493582
Epoch:4 , loass:72.79511734658347
Epoch:5 , loass:7905.47881773466
Epoch:6 , loass:7910.287118291918
}
hm,still to big.
lr=0.0001
{
Epoch:1 , loass:0.04062871445124512
Epoch:2 , loass:0.08202989919366599
Epoch:3 , loass:0.11867608327466399
Epoch:4 , loass:0.15182475683913288
Epoch:5 , loass:0.1840306407139207
Epoch:6 , loass:0.21269866782967778
Epoch:7 , loass:0.2390143628541438
Epoch:8 , loass:0.26148748367000685
Epoch:9 , loass:0.28192748126230427
Epoch:10 , loass:0.30146832446190663
}
mayby it’s better but error is still increading instead of decreasing
lr=0.00001
{
Epoch:1 , loass:0.056156842192071454
Epoch:2 , loass:0.10663441620528054
Epoch:3 , loass:0.15288402274336577
Epoch:4 , loass:0.19622855389989446
Epoch:5 , loass:0.23698536783349802
Epoch:6 , loass:0.2753670450382031
Epoch:7 , loass:0.31140105827773595
Epoch:8 , loass:0.34510847057268446
Epoch:9 , loass:0.3766837950037029
Epoch:10 , loass:0.40588362538347367
}
i dont know why it’s getting wroster

Here is model train code,yes i now that corect loss comute is make a validate after every epoh,but its to slow.

model = CNN()
model.train()
model.to(device)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
#optimizer=optim.SGD(model.parameters(),lr=learning_rate,momentum=0,nesterov=False)
loss_list = []
ep_loss=[]
sr_loss=0
loss = 0
for epoch in range(num_epochs):
    i = 0
    for data in train_loader:
        i += 1

        in_images, out_images = data
        in_images=in_images.to(device)
        out_images=out_images.to(device)

        optimizer.zero_grad()

        outputs = model(in_images) 
       
        loss = criterion(outputs, out_images[0])

        loss.backward()
        
        optimizer.step()
        if i % 20==0 :
            print("Loss = ", round(float(loss.cpu().detach().numpy()), 4), i,end="\r")
        sr_loss+=float(loss.cpu().detach().numpy())
    loss_list.append(sr_loss/len(train_X))
    #test(val_X,model,[val_Y1,val_Y2,val_Y3,val_Y4])
    print(f'Epoch:{epoch} , loass:{sr_loss/len(train_X)}')

    torch.save(model.state_dict(), "model_pytorch_1.plk")

    time.sleep(5)

Here is model

class CNN(nn.Module):
    def __init__(self, in_channels=3, num_classes=4):
        super(CNN, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=in_channels, out_channels=16, kernel_size=(5, 5), stride=(2, 2), padding=(1, 1))
        #self.pool = nn.MaxPool2d(kernel_size=(24, ), stride=(2, 2))
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=(5, 5), stride=(2, 2), padding=(1, 1))
        self.conv3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=(5, 5), stride=(2, 2), padding=(1, 1))
        self.conv4 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=(5, 5), stride=(2, 2), padding=(1, 1))
        self.drop = nn.Dropout(p=0.2)
    
        self.fc1 = nn.Linear(6912, 1024)
       # self.rnn1=nn.LSTM(input_size=1024,hidden_size=1024,num_layers=1,batch_first=False)
        self.fc2 = nn.Linear(1024, num_classes)
        
        # self.fc1 = nn.Linear(128*1*1, num_classes)

    def forward(self, x):
        x = self.conv1(x)
        #x = self.pool(x)
        x = self.conv2(x)
        # x = self.pool(x)
        x = self.conv3(x)
        # x = self.pool(x)
        x = self.conv4(x)
        #x = self.pool(x)
        x = self.drop(x)
        x = torch.flatten(x)
        x = self.fc1(x)
        #x=self.rnn1(x)
        x = self.fc2(x)
        return x

Please help make this model learn correctly and if you need dataset and code message here i will send it in your email
P.S(sorry for mistakes my eng is bad)

ayalaa2 · August 11, 2020, 9:21pm

I think the issue is this line: sr_loss/len(train_X). I’m assuming that len(train_X) is a constant and sr_loss is defined to be 0 at the beginning of training.

But you don’t ever zero it out. It’s always being added onto. So it’s just getting larger and larger as time goes on, regardless of how training is going.

OkunElya · August 12, 2020, 5:52am

Thanks! i will fix it right now

OkunElya · August 12, 2020, 6:11am

Epoch:1 , loass:0.04277728414539726
Epoch:2 , loass:0.0406925378030401
Epoch:3 , loass:0.03725993632761486
Epoch:4 , loass:0.03740865272509567
Epoch:5 , loass:0.029603138416451756
Epoch:6 , loass:0.02794443646247365
Epoch:7 , loass:0.028427238596042553