Hello,
I’m working on a Physics Informed Neural Network and I need to take the derivatives of the outputs w.r.t the inputs and use them in the loss function.
The issue is related to the neural network’s multiple outputs. I tried to use ‘autograd.grad’ to calculate the derivatives of the outputs, but it sums all the contributions.
For example, if my output ‘u’ has shape [batch_size, n_output], the derivative ‘dudx’ has shape [batch_size, 1], instead of [batch_size, n_output].
Due to the sum, I can’t use the derivatives in the loss function. I tried with a for loop to calculate each derivative but the training takes forever. Do you have any idea how to solve this problem? Thanks in advance
You could have a look at using torch.func.jacrev
and torch.func.vmap
to compute the entire jacobian of your network.
Also, please share a minimal reproducible example to help explain your problem.
Hi,
thanks, you are right I should have posted some code.
below the code as per my loss function contains high-order derivatives of the outputs with respect to the inputs x and y
def gradient(y, x, grad_outputs=None):
if grad_outputs is None:
grad_outputs = torch.ones_like(y)
grad = torch.autograd.grad(y, [x], grad_outputs=grad_outputs, create_graph=True)[0]
return grad
def compute_derivatives(x, y, u):
dudx = gradient(u, x)
dudy = gradient(u, y)
dudxx = gradient(dudx, x)
dudyy = gradient(dudy, y)
dudxxx = gradient(dudxx, x)
dudxxy = gradient(dudxx, y)
dudyyy = gradient(dudy, y)
dudxxxx = gradient(dudxxx, x)
dudxxyy = gradient(dudxxy, y)
dudyyyy = gradient(dudyyy, y)
return dudxx, dudyy, dudxxxx, dudyyyy, dudxxyy
Also, I already tried vmap with a simplified version of code but it gives back the following error:
import torch
from functorch import vmap
x = torch.tensor([1.0, 2.0], requires_grad=True)
out = torch.stack([x * 2, x * 3], dim=0)
print('x:', x)
print('out:', out)
def single_gradient(out_row, x):
grad_outputs = torch.ones_like(out_row)
return torch.autograd.grad(out_row, [x], grad_outputs=grad_outputs, create_graph=True, retain_graph=True)[0]
batched_grad = vmap(single_gradient, (0, None))(out, x)
print('Batched Grads:', batched_grad)
“RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn”
What do you think about this?
The Jacobian requires as input a function, but my input is the output with shape [batch_size,N] from the neural network.
Given I do not have a function, can I still use Jacobian?
Using the torch.func
namespace requires you to re-write how you compute the gradients entirely, you can’t mix and match between torch.autograd
and torch.func
, especially when using torch.func.vmap
(at least to my knowledge)
I have some previous examples on the forums of how to compute gradients with a ‘functional’ approach here: