Incompatible gradient size after using jit.script

Hi! I have a set of models and some time ago I decided to speed the training up by using torch.jit.script before running the training loop. For most models, it works just fine. But for one model (Implementation of SCINet) I noticed a weird issue:

Jitting works fine, training goes well while using the CPU. However, when I use a CUDA device, for some inputs I get an error like:

RuntimeError: Function torch::jit::(anonymous namespace)::DifferentiableGraphBackward returned an invalid gradient at index 1 - got [6, 1, 1] but expected shape compatible with [6, 1, 6]

Interestingly, I have many tests (positive path, have been working before) in a pipeline, but only some of them started failing after jitting the model.

The source code is here: from typing import Optional, Tuple, Unionimport numpy as npimport torchi - Pastebin.com

Is there any way to have more information about errors, to debug the issue?