I know this does not work right now. But would there be something (computationally) equivalent to detaching a node that is already in the backpropagation graph so that computation does not flow backwards through it?
Right now calling detach_() only affects computations defined/run after the call. Any ideas on how to do this?
e.g. this could be used on an RNN where the loss at each time step gets always propagated K steps backwards, instead of 1 step for the first, 2 steps for the second…
I have tried exactly that but no, it seems what matters is the “detached” state at the time of operating with a Variable, not if it gets detached afterwards.
By doing it I get always increasing execution times, which probably means I’m backpropagating through the full history of the graph. This is the critical section of the code I’m running (basically char_rnn). next_sequential_sample() generates one character at a time.
from collections import deque
train_length = 100
hidden_deque = deque(maxlen=train_length)
criterion = nn.NLLLoss()
optimizer = optim.Adam(rnn.parameters())
def train(n_steps=train_length, hidden_init=None):
hidden = hidden_init or rnn.initHidden()
for i in range(n_steps):
sample = next_sequential_sample()
input_line_tensor, target_line_tensor = Variable(torch.unsqueeze(onehot_torch(sample[0]), 1)), Variable(torch.from_numpy(sample[1]))
if len(hidden_deque) == hidden_deque.maxlen:
h_to_detach = hidden_deque.popleft()
for h in h_to_detach:
h.detach_()
hidden_deque.append(hidden)
output, hidden = rnn(input_line_tensor[0], hidden)
loss = criterion(output, target_line_tensor[0])
optimizer.zero_grad()
loss.backward(retain_graph=True)
optimizer.step()
return (hidden[0], hidden[1])
n_train_calls = 100
print_every = 1
hidden = None
for train_iter in range(1, n_train_calls + 1):
hidden = train(train_length, hidden_init=hidden)