Optimizing CUDA memory pipeline for RNN

apaszke · May 22, 2017, 2:30pm

Note that you only need to make an input or hidden volatile (it will propagate through the graph with a very high precedence).