I am working on regularisation methods and I want do the following:

To compute my regularisation for each layer, i need the product of the jacobian with the weights-operation with regards to the input. In formulas:

let l be the current layer and f be the function of the neural net so far,

so that the output of layer l is l(f(x)) for the input x

If l(x)=relu(Ax+b), then I need A(J_f(x)),

and if l(x)=relu(convolution(x)+b), then I need convolution(xJ_f(x))

I wonder how I can do this and share computation? I also need to be able to backgrob through the jacobian, it’s used in the loss. Since this is an early research-idea, efficiency is not super important as long as it’s somewhat manageable. I have found this gist: jacobian_hessian.py · GitHub. I see a “*create_graph=False*”, so if I set *create_graph* to *True* (to be able to backgrob through it), does it also share computation? I think what might improve things is forward-mode differentiation wrt to the input during the forward-propagation step of reverse-mode differentiation, but I worry about memory requirements.

It may be easier or way more efficient, I am not sure, to compute the weight-jacobian product directly, but I have not found out how so far.