Caffe2 variable-length-sequence RNN on GPU with C++ API

I’ve exported to caffe2 an RNN model which takes as input batches of variable-length sequences, and therefore uses pack_padded_sequence and pad_packed_sequence functions from torch.nn.util.

Using the Caffe2 C++ API, the model runs fine on CPU but cannot run on GPU due to VariableLengthSequencePadding not having a GPU implementation:

Cannot create operator of type 'VariableLengthSequencePadding' on the device 'CUDA'. Verify that implementation for the corresponding device exist.

Looking through pytorch/caffe2/operators/ it seems clear there is indeed no GPU implementation of that method.

So… did I miss something? How do people run RNNs on GPU? It seems too common for me to not be able to find a solution. Hopefully I’m overlooking something simple.