From one iteration to another Tensor is converting to a NoneType

Hello all !
New in pytorch trying to code PGD .
I have this error for the second iteration :
sign(): argument ‘input’ (position 1) must be Tensor, not NoneType

I don’t understand why.
Any help/ advice would be appreciated .

Find bellow the code .


def compute(self, x, y):
    #  x ( 100,1,28, 28) and  y = [100]
    x_adv = x.requires_grad_(True)
    for i in range(0,self.num_iter):
      print("x_adv", x_adv)
      criterion = nn.NLLLoss()
      prediction = self.model(x_adv) 
      loss =  criterion(prediction, y) 
      print("loss", loss)  
      print(type(x_adv), x_adv.shape)
      gradients = x_adv.grad
      gradients = self.alpha*torch.sign(gradients)
      print("gradients with sign and alpha ", gradients)
      x_adv = x_adv + gradients  
      print("x_adv + gradients ", x_adv )
      #x_adv = torch.max(torch.min(x_adv, x + self.eps), x - self.eps) # Project back into l_norm ball and correct range

You are overwriting x_adv in this line of code:

x_adv = x_adv + gradients

which will create a non-leaf tensor and you should get a warning when trying to access it’s .grad attribute:

UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See for more informations.

You could either use the suggested .retain_grad() operation or create a new leaf tensor via x_adv = x_adv.detach().requires_grad_(True) at the end of the iteration.

Hi, I am also getting the above mention error after first iteration. Below is the two layers Residual LSTM. The input size is different for each layer but the hidden size is the same (256). For the first layer the input size is 1088 for second it is 256. I think there is error in self.weight_ir but I am not sure. Can you guide me?

class RLSTMCell(jit.ScriptModule):
    def __init__(self, input_size, hidden_size, dropout=0.):
        super(RLSTMCell, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.weight_ih = Parameter(torch.zeros(4 * hidden_size, input_size))
        self.weight_hh = Parameter(torch.zeros(4 * hidden_size, hidden_size))
        self.bias_ih = Parameter(torch.zeros(4 * hidden_size))
        self.bias_hh = Parameter(torch.zeros(4 * hidden_size))
        self.weight_ir = Parameter(torch.zeros(hidden_size, input_size))

    def forward(self, input: Tensor, state: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]:
        hx, cx = state
        gates = (, self.weight_ih.t()) + self.bias_ih +
       , self.weight_hh.t()) + self.bias_hh)
        ingate, forgetgate, cellgate, outgate = gates.chunk(4, 1)

        ingate = torch.sigmoid(ingate)
        forgetgate = torch.sigmoid(forgetgate)
        cellgate = torch.tanh(cellgate)
        outgate = torch.sigmoid(outgate)

        cy = (forgetgate * cx) + (ingate * cellgate)
        ry = torch.tanh(cy) #eqution 12 in the paper
        #hy = outgate * torch.tanh(cy)
        if self.input_size == self.hidden_size:
            hy = outgate * (ry + input) #eqution 15 in the paper
            hy = outgate * (ry +, self.weight_ir.t()))
        return hy, (hy, cy)
class LSTMLayer(jit.ScriptModule):
    def __init__(self, input_size, hidden_size):
        super(LSTMLayer, self).__init__()
        self.layer1 = RLSTMCell(input_size, hidden_size)
        self.layer2 = RLSTMCell(hidden_size, hidden_size)

    def forward(self, input: Tensor, state: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]:
        inputs = input.unbind(0)
        outputs = torch.jit.annotate(List[Tensor], [])
        for i in range(len(inputs)):
            out, state = self.layer1(inputs[i], state)
            out, state= self.layer2(state[0], state)
            outputs += [out]
        return torch.stack(outputs), state

When I made the input size of these RLSTM same, like

        self.layer1 = RLSTMCell(input_size, hidden_size)
        self.layer2 = RLSTMCell(input_size, hidden_size)

then it works fine when I access param.grad. But I guess in Multilayer lstms, 2nd layer lstm takes input from the output of 1st layer lstm. In above situation I am sending the original input to both layers. Is this approach is also okay?