My networks loss is same, no matter what i do its not changing at all

class Network(nn.Module):
     def __init__(self):
     def forward(self, img):
     def init_weights(self):
     def train_step(self, batch):
             img_stack1, labels1, image_stack2, labels2 = batch
             out1 = self(img_stack1)
             out2 = self(img_stack2)
             loss1 = loss(out1[:, 0:3].unsqueeze(1), labels1))
             loss2 = loss(out2[:, 0:3].unsqueeze(1), labels2))
             loss = loss1+loss2
             return loss

def train(epochs, model, train_loader, optimizer = .....):
     optimizer = optim(model.parameters(), lr = 0.001)
     for epoch in range(1, epochs+1):
            loss_accumulator = 0.0
            for batch in train_loader:
                        loss = model.training_step(batch)

I have mentioned my code above, I am using shared network parameters. what ever i do the loss returns a value of 11.00 something. My getitem(), trainloader are working totally fine. Can someone help me out realize whats wrong i am doing, thank you.

Based on the provided code, you are not zeroing out the gradients and are never calling optimizer.step(), which would update the parameters.

I’m sorry I didn’t mention in the code above.

But, this is exactly what I used. Even then there is no change in loss.

In train loop:

loss = model.training_step()

And it’s weird that what ever I am doing it’s not changing at all it’s giving the exact same 11 all the time.

Could you check, if your model gets valid gradients after the loss.backward() call via:

for name, param in model.named_parameters():
    print(name, param.grad)

If you see a None output it would mean that these parameters won’t be updated as no gradient is calculated for them and thus your computation graph might have been detached at one point.

have you ever chance the function?

There is no layer which is printing ‘None’ in the entire computational graph.

In that case all parameters get valid gradients and your overall training might be “stuck”.
Try to overfit a small dataset (e.g. just 10 samples) by playing around with the hyperparameters.
Once this is done you can try to scale up the problem.

I tried every possible way but the loss remains constant. Checked the computation graph, loss functions dimensions… I feel like I’m struck with this. I’m trying to replicate this

Could you post the model definition, please?

class Dscnvo(nn.Module):
    def __init__(self):
        super(Dscnvo, self).__init__()
        self.conv1   = nn.Conv2d(6, 64, 7, 2, 3)
        self.conv2   = nn.Conv2d(64, 128, 5, 2, 2)
        self.conv3   = nn.Conv2d(128, 256, 5, 2, 2)
        self.conv3_1 = nn.Conv2d(256, 256, 3, 1, 1)
        self.conv4   = nn.Conv2d(256, 512, 3, 2, 1)
        self.conv4_1 = nn.Conv2d(512, 512, 3, 1, 1)
        self.conv5   = nn.Conv2d(512, 512, 3, 2, 1)
        self.conv5_1 = nn.Conv2d(512, 512, 3, 1, 1)
        self.conv6   = nn.Conv2d(512, 1024, 3, 2, 1)
        self.conv6_1 = nn.Conv2d(1024, 1024, 2, 2, 0)
        self.maxpool = nn.MaxPool2d((2, 2), stride = (2, 2))
        self.fc1     = nn.Linear(5120, 4096)
        self.fc2     = nn.Linear(4096, 1024)
        self.fc3     = nn.Linear(1024, 128)
        self.fc4     = nn.Linear(128, 6)
    def forward(self, img):
        x = F.relu(self.conv1(img))
        x = F.relu(self.conv2(x))
        x = F.relu(self.conv3(x))
        x = F.relu(self.conv3_1(x))
        x = F.relu(self.conv4(x))
        x = F.relu(self.conv4_1(x))
        x = F.relu(self.conv5(x))
        x = F.relu(self.conv5_1(x))
        x = F.relu(self.conv6(x))
        x = self.conv6_1(x)
        x = self.maxpool(x)
        x = x.view(-1, 5120)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return x

    def init_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d or nn.Linear):

This is my above model input stack of images - dim [batch_size, 6, 384, 1280] output [ batch_size, 6]

def forward_pass(batch):
    imgf, imgr, tf, af = batch
    imgf, imgr, tf, af =,,,
    #imgr =[:, 3:6, :, :],imgf[:, 0:3, :, :]), 1)
    tr   = tf*-1
    ar   = af*-1
    outf = model(imgf)
    outr = model(imgr)
    loss1 = (alpha1*(criterion(outf[:, 0:3].unsqueeze(1), tf)) + beta1*(criterion(outf[:, 3:6].unsqueeze(1), af)))
    loss2 = (alpha2*(criterion(outr[:, 0:3].unsqueeze(1), tr)) + beta2*(criterion(outr[:, 3:6].unsqueeze(1), ar)))
    loss  = loss1 + loss2
    return loss, outf, tf, af
for epoch in range(1, num_epochs+1):
    running_loss = 0.0
    for i, batch in enumerate(trainloader):
        loss, outf, tf, af = forward_pass(batch)

        running_loss += loss.item()

Thanks for the code. Your model is quite deep, so it might suffer from e.g. vanishing gradients.
I would scale it down a bit and try to over fit the small data sample with it.

1 Like

Can you post any link using ray tune for hyperparameter tuning against validation loss to avoid overfitting of model.

Thank you very much in advance

I’m unfortunately not familiar with ray tune and don’t know any good resources.

1 Like

Hey @nivesh_gadipudi, just wanted to repost here (I work on Ray Tune):

An easy way to prevent overfitting is simply to return “done=True” when validation losses begins to diverge., ...)

# or if using class API:
def step(self):
   return {"done": True, ...}

Hope that helps!

1 Like

Thanks @richardliaw for your suggestions

@ptrblck I wonder when I remove init_weights() from init() the model started learning. But how every init only get excited when Once while instance right.

Could you please elaborate on this, many thanks in advance.

I guess the default initializations might be more beneficial for your particular use case than your custom one, which would explain the better convergence after removing the init_weights() call.

1 Like