How to creating a new layer with cuDNN?

HI,

I have changed the RNN module into another one RNN1. Everything is working fine if I disable cuDNN. But once cuDNN is on, there is an error. It seems that I have to define it in cuDNN file? How to do that?

CUDNN only supports a handful of well-known RNN cell types. If you want to implement your own RNN cell, modifying or subclassing RNNBase doesn’t really work. Instead, you should subclass RNNCellBase and write something that looks like nn.RNNCell (https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/rnn.py#L353). Then use a for-loop to unroll the RNN cell manually.

1 Like

Thanks.
I have checked the performance with and without cuDNN for RNN. It seems that with cuDNN, RNN can run much faster about 10 times. Is this really the case?
I have also implemented the same code of RNN1 in Lasagne based on Theano. It seems working in a similar speed as RNN in pytorch with cuDNN.
What is the difference in using cuDNN between theano and pytorch? Is is automatically used in theano?

What is the hidden size and sequence length of your RNN? We’ve seen a 10x gap before, but only for very small hidden size and very long sequences. Since CUDNN can’t run custom cells, what Theano is most likely doing is using its graph optimizer to fuse multiple pointwise CUDA kernels into one. PyTorch currently has inefficient kernel launches because of excess getDevice calls (https://github.com/pytorch/pytorch/issues/917) so it should be less bad after those are fixed; eventually we’ll want to provide a way to create custom fused pointwise kernels for things like user-defined RNN cells, at which point it’ll be as fast as Theano for your use case.

Following is the setting for running RNN.
hidden size: 128
length: 784 (pixel mnist)
batch size: 32

I expect those new features coming soon. That would be very helpful.
Thanks.

Any updates on this?