Compute the Jacobian wrt the input for each layer

I am working on regularisation methods and I want do the following:

To compute my regularisation for each layer, i need the product of the jacobian with the weights-operation with regards to the input. In formulas:

let l be the current layer and f be the function of the neural net so far,
so that the output of layer l is l(f(x)) for the input x
If l(x)=relu(Ax+b), then I need A(J_f(x)),
and if l(x)=relu(convolution(x)+b), then I need convolution(xJ_f(x))

I wonder how I can do this and share computation? I also need to be able to backgrob through the jacobian, it’s used in the loss. Since this is an early research-idea, efficiency is not super important as long as it’s somewhat manageable. I have found this gist: · GitHub. I see a “create_graph=False”, so if I set create_graph to True (to be able to backgrob through it), does it also share computation? I think what might improve things is forward-mode differentiation wrt to the input during the forward-propagation step of reverse-mode differentiation, but I worry about memory requirements.

It may be easier or way more efficient, I am not sure, to compute the weight-jacobian product directly, but I have not found out how so far.