Don't understand why only Tensors of floating point dtype can require gradients

import torch
from torch.autograd.functional import jacobian

def get_f (x):      
       
    t1= x[0]**2
    t2= x[1]**3
    t3= x[2]**4
    f[0]= t1+t2+t3
    f[1]= 2*(t1+t2+t3)
    f[2]= 3*(t1+t2+t3)
    
    return f
    
x=torch.arange(0,3)
f=torch.arange(0,3)
J = jacobian(get_f, x).detach().numpy()

What I want J is a 3x3 matrix

df[0]/dx[0]   df[0]/dx[1]  df[0]/dx[2]
df[1]/dx[0]   df[1]/dx[1]  df[1]/dx[2]
df[2]/dx[0]   df[2]/dx[1]  df[2]/dx[2]

But I encounter the error
RuntimeError: Only Tensors of floating point and complex dtype can require gradients
I believe the requires_grad is True by default

Is there no way to diff the int32 (default) or I am doing something wrong?

I wouldn’t know how and if gradients are defined on “integer functions”. E.g. just take a simple example of f(x) = x**2.
For floating point numbers you would see:
image

and can draw the gradient directly into the plot.

But I guess this is how this simple function would look if only integers are used:
image

Would this mean that the gradient is everywhere 0 besides at the dots (where it would then be +/- Inf)?
If so, then I don’t think it would make sense to allow Autograd to accept integer values.
In case you are expecting integer outputs, it might be better to round the result.

Also, I’m sure other users such as @KFrank, @tom, and @albanD might have a better explanation.

2 Likes

Thanks, another following question, maybe I am wrong, so the AD in PyTorch is doing finite difference? so you need it to be continuous? I am new to AD, a silly question, why can’t it perform dx**2/dx=2x

To comment on the dx**2/dx = 2x question, AD isn’t computing a symbolic expression for the gradient. It’s basically using the chain rule to calculate your derivative directly. If you want to read a paper on it, I’d recommend Automatic Differentiation in Machine Learning: a Survey which will explain to you why AD isn’t symbolic differentiation nor numerical differentiation

If you do want an expression you could use the functorch library, although that’s a bit more complicated than standard PyTorch. An example for your function would be something like this,

import torch
from functorch import grad, vmap

def f(x):
  return x**2

x = torch.arange(1,5,dtype=torch.float32) #dummy input
 
df_dx = grad(f) #creates explicit function via reverse-mode AD

vmap(df_dx, in_dims=(0))(x) #returns tensor([2., 4., 6., 8.])
1 Like

The problem is not with how we compute the gradient but the definition of gradients itself.
Mathematically, for gradients to be defined, you need a continuous function (at least locally).