In order to compute gradients of a function within the pytorch framework, you need to 1) Use torch functions so that autograd can track operations or 2) Define a custom torch.autograd.Function and manually define the backward pass of your function so autograd can compute gradients.
I want to use torch.func.grad to calculate the gradient not autograd. The numpy function can be compiled to torch code; I don’t think I need to implement a backward process manually?
I still think you need to define the backward manually, can you take the grad of a compiled function from numpy?
You could always try doing torch.func.grad(compiled_fn) and seeing how torch.func works, but I feel it won’t work as torch.func works on pytorch primitives.
The error raises: RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead. So I want to ask if it not support yet or I can not do it at all.
Although this snippet can not run in 2.4.0-dev(I think it also can not run in a stable version because numpy_fn’s return is not scalar), it looks more reasonable.
I need to use torch.func.grad, not Tensor.grad, because it is more convenient and easy to calculate high-order differences. I think it should work but it does not:
@torch.compile(fullgraph=True)
@torch.compiler.wrap_numpy
def numpy_fn(X, Y):
return np.sum(X[:, :, None] * Y[:, None, :], axis=(-2, -1))
X = torch.randn(1024, 64, device="cuda")
Y = torch.randn(1024, 64, device="cuda")
Z = torch.func.grad(numpy_fn)(X, Y) # use func.grad not Tensor.grad, since may also use jvp etc.
assert isinstance(Z, torch.Tensor)
assert Z.device.type == "cuda"
Perhaps torch.func can only handle pytorch primitives, whereas the .grad approach works fine? Perhaps it might be best to get a dev’s opinion: @vfdev-5 (apologies for the tag)
When running the script the following error message I get is,
torch._dynamo.exc.Unsupported: torch.func.grad(fn) requires the function to be inlined by dynamo
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
To be able to use torch.func over NumPy, or compiled functions in general, the relevant torch.func calls should also be within torch.compile.
As a side note, fullgraph=True does not really affect this whole process, it has no side-effects. It’s there to make sure we just get one graph. wrap_numpy, is to be used within torch.compile.
Taking all this into account, we get this script, which does compile and does what you’d expect:
import numpy as np
import torch
from torch.func import grad
@torch.compiler.wrap_numpy
def fn(x):
return np.power(x, 2).sum()
@torch.compile
def my_grad(x):
return torch.func.grad(fn)(x)
arr = torch.tensor([1., 2., 3.], requires_grad=True)
print(my_grad(arr))
@Lezcano@AlphaBetaGamma96 Thanks a lot for your patient help! I have always thought this strategy works, but only a slight bug exists. I have a lot of numpy energy calculators, and I want to reuse them in the NNP package, which is why I am asking this ridiculous question. After it is fixed, I will try again!