LSTM , CNN network weights doesnt get updated

sh_sh · June 17, 2022, 3:13pm

pleeeeaaaase someone help me!!!
hi there
i’m trying to make a cnn (more specific Gat which is basically an attention network for graphs) evoloving by the help of an LSTMcell.
the problem is when I put the LSTM output in pytorch GATV2 weights the weights doesnt get updated in Gat for next epoch, although it gets updated in the lstm.
the gat weights are torch.nn.parameter type and the lstm output that i want to put in that weight is torch,floattensor type.

self.conv1 = GATv2Conv(num_node_features, hidden_channels[0])
        self.conv2 = GATv2Conv(hidden_channels[0], 2)
       
        hiddenSize=num_node_features*hidden_channels[0]
       
        self.LSTM1 = nn.LSTMCell(num_node_features,num_node_features)
     
        self.h1=nn.Parameter(nn.init.xavier_uniform(torch.Tensor( hidden_channels[0], num_node_features).type(torch.FloatTensor)), requires_grad=True)
        self.c1=nn.Parameter(nn.init.xavier_uniform(torch.Tensor( hidden_channels[0], num_node_features).type(torch.FloatTensor)), requires_grad=True)

the problem is self.conv1.lin_l.weight doesnt get updated from 2nd epoch after conv1, although it changes after lstm1

def forward(self, data):
        
        x = self.conv1(data.x, data.edge_index)
          
        print("weight gat",self.conv1.lin_l.weight)

     
        A=(self.LSTM1((self.conv1.lin_l.weight),(self.h1,self.c1)))
       
        self.conv1.lin_l.weight=torch.nn.parameter.Parameter(A[0])
        self.c1=torch.nn.parameter.Parameter(A[1])
        self.h1=torch.nn.parameter.Parameter(A[0])
      
        print("weight after lstm",self.conv1.lin_l.weight)
        x = x.relu()
        x = F.dropout(x, p=0.8, training=self.training) 
        x = self.conv2(x, data.edge_index)

and here is the usage of the model

 model.train()
    train_loss = 0
    for data in train_loader:
        data = data.to(device)
        optimizer.zero_grad()
        out = model(data)
        loss = criterion(out[data.train_mask], data.y[data.train_mask])
        _, pred = out[data.train_mask].max(dim=1)
        loss.backward()
        
        train_loss += loss.item() * data.num_graphs
        optimizer.step()
    train_loss /= len(train_loader.dataset)

I guess there might be a problem with assigning the data and types or with the relu or softmax or back propagation. can anyone guess why the weights of gatv2 doesnt get updated?
ps: I’m using google colab for execution

ptrblck · June 17, 2022, 9:43pm

This line of code looks wrong:

self.conv1.lin_l.weight=torch.nn.parameter.Parameter(A[0])

as you are assigning a new parameter to the .weight attribute, which would be missing in the optimizer. I don’t know how and were self.conv1 is used, but changing the parameters after they were used also sounds generally wrong.

In any case, if you want to manipulate the weight before they are used, you could use .copy_() and warp the manipulation into a with torch.no_grad() block.

sh_sh · June 18, 2022, 4:56pm

thanks alot for helping
i want to change the wieghts by using lstm so the output of lstm will be my new weight for the first layer (self.conv1) but after i change the weights of conv1 the weights doesnt get updated in conv1 anymore
i’ve tried your copy and deepcopy and with torch.no_grad() but still weight doesnt get updated
ps: i dont know if it can be bc the weights are not graph leaves and not created explicitly by the user or not!

ptrblck · June 18, 2022, 10:22pm

You could use the output activation as a parameter for another layer by e.g. using the functional API, but note that the optimizer will not be able to update an activation as it’s created in each forward pass using the parameters of the LSTM module.

sh_sh · June 19, 2022, 8:56am

thanks alot again you’re right i guess the problem is with the loss.backward and optimizer that doesnt update gat weights anymore, although i didn’t get the exact reason
do you know any way to make the model i need in a way that i can have the weights updated?

ptrblck · June 19, 2022, 9:32pm

No, I don’t know how this approach would properly work.
Currently you are trying to optimize an output activation which is itself created using the parameters from the LSTM module (which are passed to the optimizer and updated).
This would mean that even if the optimizer could update the output activation, these updates would be wiped in the next iteration since the updated LSTM parameters will create a new output activation.