Loss is 1 but gradients are zero

Ravi_Raja · April 16, 2022, 6:03am

I am facing this issue of gradient being 0 even though the loss is not zero. loss stays at 1 while gradients are 0. I’m using the MSE loss function. Can anyone please help me here in debugging this?

Training code snippet:

# Train network
    max_epochs = max_epochs+1
    epoch = 1
    last_acc = 0
    while epoch < max_epochs:
        gcln.train()
        optimizer.zero_grad()
        train_epoch_loss = 0
        accuracy = 0
        datalen = 0
        train_size = 0
        output = []
        target = []
        for batch_idx, (inps, tgts) in enumerate(train_loader):
            tgts = tgts.reshape((tgts.size(0), -1)).to(device)
            tgts = tgts.round()
            inps = inps.to(torch.float)
            outs = gcln(inps).to(device)
            gcln_ = copy.deepcopy(gcln)
            gcln_.cnf_layer_1.layer_or_weights = torch.nn.Parameter(
                gcln_.cnf_layer_1.layer_or_weights.round())
            gcln_.cnf_layer_1.layer_and_weights = torch.nn.Parameter(
                gcln_.cnf_layer_1.layer_and_weights.round())
            gcln_.cnf_layer_2.layer_or_weights = torch.nn.Parameter(
                gcln_.cnf_layer_2.layer_or_weights.round())
            gcln_.cnf_layer_2.layer_and_weights = torch.nn.Parameter(
                gcln_.cnf_layer_2.layer_and_weights.round())
            out_ = gcln_(inps)
            print("out_", out_.shape)
            if architecture == 1:
                t_loss = criterion(
                    outs, tgts[:, current_output].unsqueeze(-1))
                train_epoch_loss += t_loss.item()
            elif architecture == 2:
                t_loss = criterion(outs, tgts)
                train_epoch_loss += t_loss.item()
            elif architecture == 3:
                l = []
                for i in range(num_of_outputs):
                    l.append(criterion(outs[:, i], tgts[:, i]))
                t_loss = sum(l)
                train_epoch_loss += t_loss.item()/num_of_outputs
            train_size += outs.shape[0]

            optimizer.zero_grad()
            t_loss.backward()
            optimizer.step()
            if architecture == 1:
                output.append([abs(e)
                              for e in out_.round().flatten().tolist()])
                target.append(tgts[:, current_output].tolist())
                accuracy += (out_.round().squeeze() ==
                             tgts[:, current_output]).sum()
            elif architecture > 1:
                output.append([abs(e)
                              for e in out_.round().flatten().tolist()])
                target.append(tgts.flatten().tolist())
                accuracy += (out_.round() == tgts).sum()
        train_loss.append(train_epoch_loss)

Thanks

thecho7 · April 16, 2022, 6:13am

To clarify your problem, please give us a code snippet.

Ravi_Raja · April 16, 2022, 6:19am

added the code snippet for training loop

thecho7 · April 16, 2022, 7:47am

I think that code is well-written even though having some weird parts…
Loss value can be stuck with non-zero gradient but another case is not possible if your model is nicely defined.

Check every layer in your model to find any NaN or something.

InnovArul · April 16, 2022, 8:00am

Ravi_Raja:

            gcln_.cnf_layer_1.layer_or_weights = torch.nn.Parameter(
                gcln_.cnf_layer_1.layer_or_weights.round())
            gcln_.cnf_layer_1.layer_and_weights = torch.nn.Parameter(
                gcln_.cnf_layer_1.layer_and_weights.round())
            gcln_.cnf_layer_2.layer_or_weights = torch.nn.Parameter(
                gcln_.cnf_layer_2.layer_or_weights.round())
            gcln_.cnf_layer_2.layer_and_weights = torch.nn.Parameter(
                gcln_.cnf_layer_2.layer_and_weights.round())

Are you sure that these round() operations do not lead to all elements of weights being 0’s?

thecho7 · April 16, 2022, 8:13am

I don’t think glcn_ and glcn are related.
The t_loss comes from the glcn.

InnovArul · April 16, 2022, 8:18am

I see. You are right. Didn’t notice that:)
Also, there are many unknown things in the question.
What are the range of values in tgts, what about the network design and the final layer activation etc.

Ravi_Raja · April 19, 2022, 6:34am

@thecho7 there are weird parts as the problem in itself is unique. I need the weights to be interpretable. A particular weight value ranges between 0 and 1 but at the end I want them to be either 0 or 1. After training I have to read formula using the learned weights of the network.

Here’s the code for model:

class CNF_Netowrk(torch.nn.Module):
    def __init__(self, input_size, output_size, hidden_size, device) -> None:
        super().__init__()
        self.device = device
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size

        self.layer_or_weights = torch.nn.Parameter(
            torch.Tensor(
                self.input_size, self.hidden_size
            ).uniform_(0., 1.).to(dtype=torch.double).to(self.device)
        )

        self.layer_and_weights = torch.nn.Parameter(
            torch.Tensor(
                self.hidden_size, self.output_size
            ).uniform_(0., 1.).to(dtype=torch.double).to(self.device)
        )

    def apply_gates(self, x, y):
        return torch.mul(x, y)

    def forward(self, inputs):
        with torch.no_grad():
            self.layer_or_weights.data.clamp_(0.0, 1.0)
            self.layer_and_weights.data.clamp_(0.0, 1.0)

        # gated_inputs.shape: batch_size x hidden_size
        gated_inputs = self.apply_gates(self.layer_or_weights, inputs)
        o = 1 - gated_inputs

        # or_res.shape: batch_size x K
        or_res = 1 - util.tnorm_n_inputs(o)
        or_res = or_res.unsqueeze(-1)

        # gated_or_res.shape: batch_size x K
        gated_or_res = self.apply_gates(self.layer_and_weights, or_res)
        gated_or_res = torch.add(
            gated_or_res, 1.0 - self.layer_and_weights, alpha=1)

        # out.shape: batch_size x output_size
        outs = util.tnorm_n_inputs(gated_or_res).unsqueeze(-1)

        return outs

Also, I checked the outputs of each layer, there’s no NaNs in them.

@InnovArul gcln_ is deepcopied to round the weights to 0’s and 1’s so that I can use them to get binary outputs to be compared with tgts.

tgts is binary vector.

mMagmer · April 19, 2022, 7:43am

hi,
maybe you’re cutting computation graph somewhere in forward pass.
i think you can check it by filling parameter.grad with value other than zero.
then backward loss to see if it changes at all.

InnovArul · April 19, 2022, 8:28am

Just a hypothesis. Can you plot the distribution of self.layer_and_weights for every training iteration?
Is there a chance that after some iterations of training, this self.layer_and_weights is going 0?
Can you verify that?

Ravi_Raja · April 20, 2022, 11:01am

No the weights are not going to 0 but it remains constant after some time. Which is because gradient being 0.

Ravi_Raja · April 20, 2022, 11:06am

@mMagmer yes you are right I guess. gradients doesn’t change at all after doing what you said to do. But why is this not happening in case of other examples. And how to find out what is cutting the computation graph?

mMagmer · April 20, 2022, 11:19am

now that i think about it, the test is not the right way to check for computation graph.
i wanna say it’s in util.tnorm_n_inputs part, but i’m not sure.

Ravi_Raja · April 20, 2022, 11:30am

Here’s the implementation for tnorm_n_inputs. I’m using the product case. Do you think that is problematic?

def tnorm_n_inputs(self, inp):
        '''
        Fuzzy alternative for Logical AND
        '''

        if self.name == "godel":
            out, _ = torch.min(inp, dim=-2)
            return out
        elif self.name == "product":
            return torch.prod(inp, -2)
        else:
            print("Wrong Name!")

mMagmer · April 20, 2022, 11:40am

no problem in this case. but torch.min will cut the graph,

Ravi_Raja · April 20, 2022, 11:53am

Yeah. I’m not using that anywhere.

mMagmer · April 20, 2022, 11:55am

mMagmer · April 20, 2022, 12:00pm

i think i am wrong about this,
sorry

Ravi_Raja · April 20, 2022, 12:01pm

No worries. Thanks for helping.

Ravi_Raja · April 20, 2022, 2:22pm

@InnovArul can you tell what happens if layer_and_weights goes to zero after some iterations?