How does Pytorch pack LSTM weights to pass to cuDNN?

I have trained a model that uses nn.LSTMCell. For reasons of throughput, I want to directly call cuDNN’s cudnnRNNForwardInference. So I have to export weights of nn.LSTMCell. For other layers such as linear or convolution, this wasn’t hard. However, this is very difficult for nn.LSTMCell because nn.LSTMCell takes 2 sets of weights and 2 sets of biases, while cudnnRNNForwardInference only takes a single set of weight and a single set of bias:

nn.LSTMCell

~LSTMCell.weight_ih – the learnable input-hidden weights, of shape (4*hidden_size, input_size)

~LSTMCell.weight_hh – the learnable hidden-hidden weights, of shape (4*hidden_size, hidden_size)

~LSTMCell.bias_ih – the learnable input-hidden bias, of shape (4*hidden_size)

~LSTMCell.bias_hh – the learnable hidden-hidden bias, of shape (4*hidden_size)

cudnnRNNForwardInference

cudnnStatus_t cudnnRNNForwardInference(
    cudnnHandle_t                   handle,
    const cudnnRNNDescriptor_t      rnnDesc,
    const int                       seqLength,
    const cudnnTensorDescriptor_t  *xDesc,
    const void                     *x,
    const cudnnTensorDescriptor_t   hxDesc,
    const void                     *hx,
    const cudnnTensorDescriptor_t   cxDesc,
    const void                     *cx,
    const cudnnFilterDescriptor_t   wDesc,
    const void                     *w,
    const cudnnTensorDescriptor_t   *yDesc,
    void                           *y,
    const cudnnTensorDescriptor_t   hyDesc,
    void                           *hy,
    const cudnnTensorDescriptor_t   cyDesc,
    void                           *cy,
    void                           *workspace,
    size_t                          workSpaceSizeInBytes)

How would I pack each of the weights: weight_ih, weight_hh, bias_ih, bias_hh s.t. it can be passed to cuDNN’s cudnnRNNForwardInference . as wDesc parameter?