How to use gradient checkpointing on packed sequence RNN

kenenbek · May 13, 2021, 2:15pm

I have a batch of sequences that have a variable length. To save computation I used pack_padded_sequence as following:

input = torch.nn.utils.rnn.pad_sequence(input, batch_first=True)
input = torch.nn.utils.rnn.pack_padded_sequence(input,
                                                batch_first=True,
                                                lengths=lengths)

Because sequences are long, I use gradient checkpointing to save memory

output, hiddens = cp.checkpoint(self.gru, *(input, hiddens, self.dummy_tensor))

As a result I have such error:

File ".../src/sequence_models/gru.py", line 86, in forward
    output, hiddens = cp.checkpoint(self.gru, *(input, hiddens, self.dummy_tensor))
  File ".../torch/utils/checkpoint.py", line 177, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
TypeError: CheckpointFunctionBackward.forward: expected Tensor or tuple of Tensor (got PackedSequence) for return value 0

How can I handle it?