I have trained a model that uses nn.LSTMCell
. For reasons of throughput, I want to directly call cuDNN’s cudnnRNNForwardInference
. So I have to export weights of nn.LSTMCell
. For other layers such as linear or convolution, this wasn’t hard. However, this is very difficult for nn.LSTMCell
because nn.LSTMCell
takes 2 sets of weights and 2 sets of biases, while cudnnRNNForwardInference
only takes a single set of weight and a single set of bias:
nn.LSTMCell
~LSTMCell.weight_ih – the learnable input-hidden weights, of shape (4*hidden_size, input_size)
~LSTMCell.weight_hh – the learnable hidden-hidden weights, of shape (4*hidden_size, hidden_size)
~LSTMCell.bias_ih – the learnable input-hidden bias, of shape (4*hidden_size)
~LSTMCell.bias_hh – the learnable hidden-hidden bias, of shape (4*hidden_size)
cudnnRNNForwardInference
cudnnStatus_t cudnnRNNForwardInference(
cudnnHandle_t handle,
const cudnnRNNDescriptor_t rnnDesc,
const int seqLength,
const cudnnTensorDescriptor_t *xDesc,
const void *x,
const cudnnTensorDescriptor_t hxDesc,
const void *hx,
const cudnnTensorDescriptor_t cxDesc,
const void *cx,
const cudnnFilterDescriptor_t wDesc,
const void *w,
const cudnnTensorDescriptor_t *yDesc,
void *y,
const cudnnTensorDescriptor_t hyDesc,
void *hy,
const cudnnTensorDescriptor_t cyDesc,
void *cy,
void *workspace,
size_t workSpaceSizeInBytes)
How would I pack each of the weights: weight_ih
, weight_hh
, bias_ih
, bias_hh
s.t. it can be passed to cuDNN’s cudnnRNNForwardInference
. as wDesc
parameter?