Shouldn’t the truncated example be
# truncated to the last K timesteps
for t in range(T):
out = model(out)
if T - t == K:
out.backward()
out.detach()
out.backward()
Shouldn’t the truncated example be
# truncated to the last K timesteps
for t in range(T):
out = model(out)
if T - t == K:
out.backward()
out.detach()
out.backward()