No acc change in custom Model despite param.requires_grad = True

pklo16 · May 4, 2021, 2:20pm

Hello PyTorch Community! Glad to have you there.

I have problem with my custom model with which I’m stuck for almost 2 weeks.

My Model requires, lets say, N ordinary binary classifiers and 2 matrices which entries shall be parameters too. For x - entry data we try to get output for every classifier, tensorize it and then multiply given tensor with those matrices in a certain way.

I did create two classes outside main Model - BinaryClassifier and ProbMatrix, latter has matrix as a parameter, which is learnable. Separately nn.ModuleList = ClassifiersList with all BinaryClassifiers is created as a parameter of Model, then ProbMatrix1 and ProbMatrix2 out side of list. Parameters therefore are ModuleList(N*BinaryClassif.) and 2x ProbMatrix

In Model.forward(x) method Model runs operations:

Create tensor Realisation of size N of (n-th BinaryClassifier(x)) by iterating ClassifiersList, then .repeat() it to get 2D-tensor
We set P as a result of algebraic operations on Realisation, ProbMatrix1.matrix and ProbMatrix2.matrix. For latter two i did implement method for substraction.

3.Then we compute CrossEntropyLoss based on part of created Q, which is bigger tensor with P data copied. Both of those should NOT be parameters.

I really tried everything, requires_grad is True for all parameters, nevertheless no accuracy increase or decrease occurs. As i did some quick “print-debug” stuff I’m sure that there is problem with ProbMatrices, there is no change in parameters of those whatsoever.

Here I present some code, its a mess, but main question is - do I need to include those unwanted matrices as parameters? Give them require_grad = True? I can provide more code if needed.

Thank you a lot!

class BinaryNet(nn.Module):

    def __init__(self, input_shape):

        super(BinaryNet, self).__init__()

        self.LinearLayer1= Linear(input_shape, 1)
        nn.init.xavier_uniform(self.LinearLayer1.weight.data)

    def forward(self, x):
        x=torch.flatten(x)
        x=self.LinearLayer1(x)
        x=torch.squeeze(x)
        x=torch.sigmoid(x)
        return x

class ProbabilityMatrix(nn.Module):

    def __init__(self, c_count,l_count) -> None:
        super(ProbabilityMatrix, self).__init__()
        self.classifleaves_size = c_count+l_count
        self.classif_size = c_count
        self.matrix = Parameter(torch.Tensor(self.classifleaves_size,self.classif_size))
        self.reset_parameters()
        
    def reset_parameters(self) -> None:
        nn.init.kaiming_uniform_(self.matrix, a=math.sqrt(5))
    def extra_repr(self)->str:
        return 'Classifiers + Leaves = {}, Classifiers = {}'.format(
            self.classifleaves_size, self.classif_size)
    
    def substract_prob_matrices(a,b):
        return torch.sub(a.matrix,b.matrix)

class Graph(nn.Module):

    def __init__(
            self,
            l_count,
            c_count,
            step_count,
            input_shape,
            use_cuda=False):

        super(Graph, self).__init__()

        self.l_count = l_count
        self.c_count = c_count
        self.step_count = step_count
        self.input_shape = input_shape

        self.M0matrix = ProbabilityMatrix(c_count, l_count)
        self.M1matrix = ProbabilityMatrix(c_count, l_count)
        self.classifiers_list = nn.ModuleList()

        for i in range(c_count):
            model = BinaryNet(input_shape)
            self.classifiers_list.append(model)

    def forward(self, x):

        binary_realisation = Tensor([model.forward(x).item(
        ) for model in self.classifiers_list[0:self.c_count]]).repeat(self.l_count+self.c_count, 1)

        P_Matrix = binary_realisation * (self.M0matrix.matrix -  self.M1matrix.matrix) + self.M0matrix.matrix

        Q_Matrix = torch.ones(
            (self.l_count+self.c_count, self.l_count+self.c_count), requires_grad=True)
        for col in range(self.l_count+self.c_count):
            if(col < self.c_count):
                for row in range(self.l_count+self.c_count):
                    with torch.no_grad():
                        Q_Matrix[row][col] = P_Matrix[row][col]
            else:
                for row in range(self.c_count):
                    with torch.no_grad():
                        Q_Matrix[row][col] = 0.

        final_vec = torch.zeros(
            (self.l_count+self.c_count, 1), requires_grad=True)
        with torch.no_grad():
            final_vec[0] = 1

            for step in range(self.step_count):
                final_vec = torch.matmul(Q_Matrix, final_vec)

            final_leaf = final_vec[self.l_count:]
            final_leaf = torch.unsqueeze(final_leaf, 0)
            final_leaf = torch.squeeze(final_leaf, 2)
            final_leaf = torch.tensor(final_leaf, requires_grad=True)
        return final_leaf

ptrblck · May 5, 2021, 6:33am

I’m not familiar with your use case, but note that rewrapping tensors in new tensors will detach them from the computation graph and no gradients will be calcualted.
Also calling .item() will return a Python literal and would thus detach the result.
This is the case in your forward method:

binary_realisation = Tensor([model.forward(x).item(
        ) for model in self.classifiers_list[0:self.c_count]]).repeat(self.l_count+self.c_count, 1)

binary_realisation will be a new tensor and thus Autograd will not backpropagate to model and any parameter in it.
If that’s not desired, remove the Tensor usage as well as the item() operation and create a new tensor via torch.stack or torch.cat.

Also, Autograd won’t track any operations inside the no_grad() block, but I assume that’s intended.

pklo16 · May 5, 2021, 3:14pm

Thank you a lot for response!

Problem was hidden in exploding values during taking n-th power of Q, which made loss go to \infty.
Nevertheless, your answers are very helpful since I had no clue it would mess with backprop. Thank once again!