Detach_() on nodes already in graph

garibarba · June 13, 2017, 9:44pm

I know this does not work right now. But would there be something (computationally) equivalent to detaching a node that is already in the backpropagation graph so that computation does not flow backwards through it?

Right now calling detach_() only affects computations defined/run after the call. Any ideas on how to do this?

e.g. this could be used on an RNN where the loss at each time step gets always propagated K steps backwards, instead of 1 step for the first, 2 steps for the second…

ruotianluo · June 15, 2017, 6:01am

Can you try, save all the hidden vectors in a list, and use detach_() to detach those vectors which are out of K steps.

It may not free the all the memory, but I guess it will prevent from backpropogating through early steps.

garibarba · June 15, 2017, 12:15pm

Edit: posted ahead of time

garibarba · June 15, 2017, 1:03pm

I have tried exactly that but no, it seems what matters is the “detached” state at the time of operating with a Variable, not if it gets detached afterwards.

By doing it I get always increasing execution times, which probably means I’m backpropagating through the full history of the graph. This is the critical section of the code I’m running (basically char_rnn). next_sequential_sample() generates one character at a time.

from collections import deque

train_length = 100

hidden_deque = deque(maxlen=train_length)

criterion = nn.NLLLoss()

optimizer = optim.Adam(rnn.parameters())

def train(n_steps=train_length, hidden_init=None):
    hidden = hidden_init or rnn.initHidden()
            
    for i in range(n_steps):     
        sample = next_sequential_sample()
        input_line_tensor, target_line_tensor = Variable(torch.unsqueeze(onehot_torch(sample[0]), 1)), Variable(torch.from_numpy(sample[1]))
        
        if len(hidden_deque) == hidden_deque.maxlen:
          h_to_detach = hidden_deque.popleft()
          for h in h_to_detach:
            h.detach_()
        hidden_deque.append(hidden)
        
        output, hidden = rnn(input_line_tensor[0], hidden)
    
        loss = criterion(output, target_line_tensor[0])
        
        optimizer.zero_grad()
        loss.backward(retain_graph=True)
        optimizer.step()
        
    return (hidden[0], hidden[1])

n_train_calls = 100
print_every = 1

hidden = None

for train_iter in range(1, n_train_calls + 1):
          
    hidden = train(train_length, hidden_init=hidden)

@apaszke mentioned in Delete variable from the graph that they would add this soon, but I’m not sure if there has been any progress on it.

ruotianluo · June 16, 2017, 5:35am

I tried on some code; can’t figure out how to do this.