Why didn't loss.backward() update the network's grads?

Jingyang_Zhang · September 16, 2020, 7:59pm

Hi, recently I used the DDPG agent based on PyTorch to build a project.

When I was trying to read a batch of combinations of (state, action, reward, next_state) from the buffer and to update the weights of the 4 networks, it seemed loss.backward() function didn’t update the grads at all! There is just no data shown on the tensorboard page.

Here is the code for “update based on the buffer” part:

def learn(self, adj):
    if self.buffer.mem_cntr < self.batch_size:
        return
    state_vectors, action_vectors, rewards, next_state_vectors = \
                               self.buffer.sample_buffer(self.batch_size)

    adj = T.tensor(adj, dtype = T.float32).to(self.actor.device)
    for i in range(self.batch_size):
        state_vector = T.tensor(state_vectors[i], dtype = T.float32).unsqueeze(0).to(self.actor.device)
        next_state_vector = T.tensor(next_state_vectors[i], dtype = T.float32).unsqueeze(0).to(self.actor.device)
        action_vector = T.tensor(action_vectors[i], dtype = T.float32).unsqueeze(0).to(self.actor.device)
        reward = T.tensor(rewards[i], dtype = T.float32).to(self.actor.device)

        next_target_action_vector = self.target_actor.forward(next_state_vector, adj)
        next_target_q = self.target_critic(next_state_vector, next_target_action_vector, adj)
        q = self.critic.forward(state_vector, action_vector, adj)

        target_q = reward + self.gamma * next_target_q
    
        self.critic.optimizer.zero_grad()
        critic_loss = F.mse_loss(q, target_q)
        critic_loss.backward()
        self.critic.optimizer.step()
            
        self.actor.optimizer.zero_grad()
        actor_loss = -self.critic.forward(state_vector, 
                                              self.actor.forward(state_vector, adj), 
                                              adj) / self.batch_size
        actor_loss.backward()
        self.actor.optimizer.step()

    self.soft_update()

How to modify the codes to make the actor-network work correctly?

liuruiqi1107 · September 16, 2020, 8:53pm

That’s not how batch samples works. You need to sample a batch of samples simultaneously and update the actor/critic network by these samples. The loss of the actor is the mean of critic loss on these samples, not just decided by the batch_size. Hope I made my point, good luck!

Jingyang_Zhang · September 16, 2020, 9:09pm

Hi Liu, thank you for your reply! I know your point but for some reason, I have to replace some layers in the original actor and critic network, the networks consist of GCN(Graph Convolution Network) layers and linear layer, instead of mere Linear layers now. For GCN layers, I have two inputs: state_vector, which is a 1XN tensor, and an adjacency matrix, which is an NXN tensor. Inside the layer, the following computation is performed:

[weight matrix]     X      [input]        X         [adjacency matrix]
(out_dim X in_dim)  X    (in_dim X N)     X               (N X N)

and the output is a (out_dim X N) tensor, then this output is passed to the next GCN layer
As you can see, the first input(state_vector) is an 1 X N tensor, but the batch tensor is a (batch_size X N) tensor, which makes it impossible to pass it to the GCN layer, this is the reason I have to use a for loop to extract a single tuple of (state, action, reward, next_state) to make sure the dimensions match.
If I don’t update the weights in each loop, does that count as “simultaneously”?

To be honest, I am not sure if I have written the correct code to perform loss.backward() and optimizer.step()

Can I recode in this way?

self.critic.optimizer.zero_grad()
self.actor.optimizer_zero_grad()

# for loop to extract tuples and perform loss.backward()

self.critic.optimizer.step()
self.actor.optimizer.step()

liuruiqi1107 · September 16, 2020, 9:36pm

Hi, you are welcome. You don’t need to worry about the dimension, because PyTorch will consider the first dimension of the input to the network as BATCH dimension (as long as your network is inherited from nn.Module). For example, the input size of your network is 1xD, and you feed BxD arrays to your network, that will definitely work well, PyTorch will consider B as the BATCH dimension. And you don’t need to recode, the original code is ok. Just fix the batch update part. Hope that helps, Good luck!

Jingyang_Zhang · September 16, 2020, 11:11pm

Hi Liu,

I did as you say, but it did show dimension mismatch:


  File "/home/yubi/practice/GCN_RL_Circuit_Project/circuit_rl_experiment/stage5_gcn_ddpg/main_circuit_gcn_ddpg.py", line 85, in <module>
    agent.learn(normalized_adj)

  File "/home/yubi/practice/GCN_RL_Circuit_Project/circuit_rl_experiment/stage5_gcn_ddpg/gcn_ddpg_agent.py", line 221, in learn
    target_actions = self.target_actor.forward(states_, adj)

  File "/home/yubi/practice/GCN_RL_Circuit_Project/circuit_rl_experiment/stage5_gcn_ddpg/gcn_ddpg_agent.py", line 61, in forward
    x = self.gc1(x, adj)

  File "/home/yubi/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)

  File "/home/yubi/practice/GCN_RL_Circuit_Project/circuit_rl_experiment/stage5_gcn_ddpg/gcn_layers.py", line 37, in forward
    support = torch.mm(self.weight, input)

RuntimeError: size mismatch, m1: [18 x 1], m2: [32 x 9] at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/THC/generic/THCTensorMathBlas.cu:290

and this is my GCN layer code:

import math

import torch

from torch.nn.parameter import Parameter
from torch.nn.modules.module import Module


class GraphConvolution(Module):
    """
    Simple GCN layer, similar to https://arxiv.org/abs/1609.02907
    """

    def __init__(self, in_features, out_features, bias=True):
        super(GraphConvolution, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        # self.weight = Parameter(torch.FloatTensor(in_features, out_features))
        self.weight = Parameter(torch.FloatTensor(out_features, in_features))
        if bias:
            # self.bias = Parameter(torch.FloatTensor(out_features))
            self.bias = Parameter(torch.FloatTensor(out_features, 1))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()

    def reset_parameters(self):
        stdv = 1. / math.sqrt(self.out_features)
        self.weight.data.uniform_(-stdv, stdv)
        if self.bias is not None:
            self.bias.data.uniform_(-stdv, stdv)

    def forward(self, input, adj):
        #support = torch.mm(input, self.weight)
        #output = torch.spmm(adj, support)
        
        support = torch.mm(self.weight, input)
        output = torch.mm(support, torch.transpose(adj, 0, 1))
        if self.bias is not None:
            return output + self.bias
        else:
            return output

    def __repr__(self):
        return self.__class__.__name__ + ' (' \
               + str(self.in_features) + ' -> ' \
               + str(self.out_features) + ')'