My networks loss is same, no matter what i do its not changing at all

nivesh_gadipudi · September 18, 2020, 12:57pm

class Network(nn.Module):
     def __init__(self):
     def forward(self, img):
     def init_weights(self):
     def train_step(self, batch):
             img_stack1, labels1, image_stack2, labels2 = batch
             out1 = self(img_stack1)
             out2 = self(img_stack2)
             loss1 = loss(out1[:, 0:3].unsqueeze(1), labels1))
             loss2 = loss(out2[:, 0:3].unsqueeze(1), labels2))
             loss = loss1+loss2
             return loss

def train(epochs, model, train_loader, optimizer = .....):
     optimizer = optim(model.parameters(), lr = 0.001)
     model.init_weights()
     for epoch in range(1, epochs+1):
            loss_accumulator = 0.0
            for batch in train_loader:
                        optimizer.zero_grad()
                        loss = model.training_step(batch)
			loss.backward()

I have mentioned my code above, I am using shared network parameters. what ever i do the loss returns a value of 11.00 something. My getitem(), trainloader are working totally fine. Can someone help me out realize whats wrong i am doing, thank you.

ptrblck · September 19, 2020, 8:08am

Based on the provided code, you are not zeroing out the gradients and are never calling optimizer.step(), which would update the parameters.

nivesh_gadipudi · September 19, 2020, 5:51pm

I’m sorry I didn’t mention in the code above.

But, this is exactly what I used. Even then there is no change in loss.

In train loop:

optimizer.zero_grad()
loss = model.training_step()
loss.backward()
optimizer.step()

nivesh_gadipudi · September 19, 2020, 5:56pm

And it’s weird that what ever I am doing it’s not changing at all it’s giving the exact same 11 all the time.

ptrblck · September 20, 2020, 2:44am

Could you check, if your model gets valid gradients after the loss.backward() call via:

for name, param in model.named_parameters():
    print(name, param.grad)

If you see a None output it would mean that these parameters won’t be updated as no gradient is calculated for them and thus your computation graph might have been detached at one point.

XiaoLin_on_the_way · September 20, 2020, 6:15am

have you ever chance the function?

nivesh_gadipudi · September 20, 2020, 6:55am

There is no layer which is printing ‘None’ in the entire computational graph.

ptrblck · September 21, 2020, 5:47am

In that case all parameters get valid gradients and your overall training might be “stuck”.
Try to overfit a small dataset (e.g. just 10 samples) by playing around with the hyperparameters.
Once this is done you can try to scale up the problem.

nivesh_gadipudi · October 22, 2020, 11:14pm

I tried every possible way but the loss remains constant. Checked the computation graph, loss functions dimensions… I feel like I’m struck with this. I’m trying to replicate this https://www.hindawi.com/journals/complexity/2020/6367273/

ptrblck · October 22, 2020, 11:49pm

Could you post the model definition, please?

nivesh_gadipudi · October 23, 2020, 1:44am

class Dscnvo(nn.Module):
    def __init__(self):
        super(Dscnvo, self).__init__()
        self.conv1   = nn.Conv2d(6, 64, 7, 2, 3)
        self.conv2   = nn.Conv2d(64, 128, 5, 2, 2)
        self.conv3   = nn.Conv2d(128, 256, 5, 2, 2)
        self.conv3_1 = nn.Conv2d(256, 256, 3, 1, 1)
        self.conv4   = nn.Conv2d(256, 512, 3, 2, 1)
        self.conv4_1 = nn.Conv2d(512, 512, 3, 1, 1)
        self.conv5   = nn.Conv2d(512, 512, 3, 2, 1)
        self.conv5_1 = nn.Conv2d(512, 512, 3, 1, 1)
        self.conv6   = nn.Conv2d(512, 1024, 3, 2, 1)
        self.conv6_1 = nn.Conv2d(1024, 1024, 2, 2, 0)
        self.maxpool = nn.MaxPool2d((2, 2), stride = (2, 2))
        self.fc1     = nn.Linear(5120, 4096)
        self.fc2     = nn.Linear(4096, 1024)
        self.fc3     = nn.Linear(1024, 128)
        self.fc4     = nn.Linear(128, 6)
        self.init_weights()
    
    def forward(self, img):
        x = F.relu(self.conv1(img))
        x = F.relu(self.conv2(x))
        x = F.relu(self.conv3(x))
        x = F.relu(self.conv3_1(x))
        x = F.relu(self.conv4(x))
        x = F.relu(self.conv4_1(x))
        x = F.relu(self.conv5(x))
        x = F.relu(self.conv5_1(x))
        x = F.relu(self.conv6(x))
        x = self.conv6_1(x)
        x = self.maxpool(x)
        x = x.view(-1, 5120)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return x

    def init_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d or nn.Linear):
                nn.init.xavier_normal_(m.weight)

This is my above model input stack of images - dim [batch_size, 6, 384, 1280] output [ batch_size, 6]

def forward_pass(batch):
    imgf, imgr, tf, af = batch
    imgf, imgr, tf, af = imgf.to(device),imgr.to(device), tf.to(device), af.to(device)
    #imgr = torch.cat((imgf[:, 3:6, :, :],imgf[:, 0:3, :, :]), 1)
    tr   = tf*-1
    ar   = af*-1
    outf = model(imgf)
    outr = model(imgr)
    loss1 = (alpha1*(criterion(outf[:, 0:3].unsqueeze(1), tf)) + beta1*(criterion(outf[:, 3:6].unsqueeze(1), af)))
    loss2 = (alpha2*(criterion(outr[:, 0:3].unsqueeze(1), tr)) + beta2*(criterion(outr[:, 3:6].unsqueeze(1), ar)))
    loss  = loss1 + loss2
    return loss, outf, tf, af

for epoch in range(1, num_epochs+1):
    running_loss = 0.0
    for i, batch in enumerate(trainloader):
        #forward
        loss, outf, tf, af = forward_pass(batch)

        #backward
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

ptrblck · October 23, 2020, 3:55am

Thanks for the code. Your model is quite deep, so it might suffer from e.g. vanishing gradients.
I would scale it down a bit and try to over fit the small data sample with it.

nivesh_gadipudi · October 25, 2020, 4:39pm

Can you post any link using ray tune for hyperparameter tuning against validation loss to avoid overfitting of model.

Thank you very much in advance

ptrblck · October 26, 2020, 1:10am

I’m unfortunately not familiar with ray tune and don’t know any good resources.

richardliaw · October 29, 2020, 8:03am

Hey @nivesh_gadipudi, just wanted to repost here (I work on Ray Tune):

An easy way to prevent overfitting is simply to return “done=True” when validation losses begins to diverge.

tune.report(done=True, ...)

# or if using class API:
def step(self):
   ...
   return {"done": True, ...}

Hope that helps!

nivesh_gadipudi · October 29, 2020, 8:17am

Thanks @richardliaw for your suggestions

nivesh_gadipudi · December 11, 2020, 4:22pm

@ptrblck I wonder when I remove init_weights() from init() the model started learning. But how every init only get excited when Once while instance right.

Could you please elaborate on this, many thanks in advance.

ptrblck · December 11, 2020, 10:57pm

I guess the default initializations might be more beneficial for your particular use case than your custom one, which would explain the better convergence after removing the init_weights() call.