No grad accumulator for a saved leaf! error for layernorm script

Hi,
I tried the script from [https://pytorch.org/blog/optimizing-cuda-rnn-with-torchscript/](Optimizing CUDA Recurrent Neural Networks with TorchScript), and tried to do backward, simply add “out.sum().backward()” below [https://github.com/pytorch/pytorch/blob/cbcb2b5ad767622cf5ec04263018609bde3c974a/benchmarks/fastrnns/custom_lstms.py#L454], I got an error “No grad accumulator for a saved leaf!”.
Anyone knows how to solve it?
My pytorch version: 1.1.0 installed from conda
Thanks.