In the following example, the variable hidden is changed every time step, is this assignment in place operation? If it is, does calling backward() yield correct gradient?
Thank you!

def train(category_tensor, line_tensor):
rnn.zero_grad()
hidden = rnn.init_hidden()
for i in range(line_tensor.size()[0]):
output, hidden = rnn(line_tensor[i], hidden)
loss = criterion(output, category_tensor)
loss.backward()
optimizer.step()
return output, loss.data[0]

It depends.
If they are needed to compute gradients, yes the autograd will keep them in memory.
If they are not needed to compute gradients, they will be freed when you associate a new Tensor to the python variable.