Gradcheck fails despite correct implementation in backward

Fails for the following function:

class FRUFunction(Function):
    @staticmethod
    def forward(ctx, s, a):
        k = 1 / (a + s - a * s)

        ctx.save_for_backward(s, k, a)
        
        return k * s

    @staticmethod
    def backward(ctx, grad_output):
        s, k, a = ctx.saved_tensors
        grad_s = grad_a = None
        
        if ctx.needs_input_grad[0]:
            grad_s = a / (a + s - a*s)** 2

        return grad_s, grad_a

Check with code:

input = torch.tensor([[0.0971, 0.5413, 0.8107]], dtype=torch.double, requires_grad=True)
alpha = torch.tensor([[0.2904, -0.3183, 0.7001]], dtype=torch.double, requires_grad=False)
fru_input = (
    input,
    alpha,
)
test = gradcheck(fru, fru_input, eps=1e-3)

It throws runtime exception:

RuntimeError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[ 2.2495,  0.0000,  0.0000],
        [ 0.0000, -2.0370,  0.0000],
        [ 0.0000,  0.0000,  0.7869]], dtype=torch.float64)
analytical:tensor([[ 2.2495,  2.2495,  2.2495],
        [-2.0370, -2.0370, -2.0370],
        [ 0.7869,  0.7869,  0.7869]], dtype=torch.float64)

NOTE: It even fails for

@staticmethod
def forward(ctx, s, a):
    k = 1 / (a + s - a * s)
    ctx.save_for_backward(s, k, a)
    
    return k

@staticmethod
def backward(ctx, grad_output):
    s, k, a = ctx.saved_tensors
    grad_s = grad_a = None
    if ctx.needs_input_grad[0]:
        grad_s = (a-1) / ((a + s - a*s) ** 2)
    
    return grad_s, grad_a

and for

@staticmethod
def forward(ctx, s, a):
    k = a + s - a * s
    ctx.save_for_backward(s, k, a)
    
    return k

@staticmethod
def backward(ctx, grad_output):
    s, k, a = ctx.saved_tensors
    grad_s = grad_a = None
    if ctx.needs_input_grad[0]:
        grad_s = a-1
    
    return grad_s, grad_a

and for

@staticmethod
def forward(ctx, s, a):
    k = a * s
    ctx.save_for_backward(s, k, a)
    
    return k

@staticmethod
def backward(ctx, grad_output):
    s, k, a = ctx.saved_tensors
    grad_s = grad_a = None
    if ctx.needs_input_grad[0]:
        grad_s = a
    
    return grad_s, grad_a

Hey,

I looks like none of your formulas is using the grad_output parameter?

What the backward function for a functin f should be computing is the vector Jacobian product between grad_output and the Jacobian of f.

1 Like

Not sure I got you. I tried to follow that guide: Extending PyTorch — PyTorch 1.8.0 documentation

Do you suggest grad_output * grad_s ?

So, I’ve tried:

@staticmethod
    def forward(ctx, s, a):
        k = 1 / (a + s - a * s)

        ctx.save_for_backward(s, a, k)

        return k

@staticmethod
def backward(ctx, grad_output):
    s, a, k = ctx.saved_tensors
    grad_s = None
    grad_s = (1 - a) * (k ** 2)

    return grad_output * grad_s, None

still get:

RuntimeError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[-5.4966,  0.0000,  0.0000],
        [ 0.0000, -8.4367,  0.0000],
        [ 0.0000,  0.0000, -0.3371]], dtype=torch.float64)
analytical:tensor([[5.4966, 0.0000, 0.0000],
        [0.0000, 8.4366, 0.0000],
        [0.0000, 0.0000, 0.3371]], dtype=torch.float64)

Well, it seems that piece does work for me:

    @staticmethod
    def forward(ctx, s, a):
        k = 1 / (a + s - a * s)

        ctx.save_for_backward(s, a, k)

        return k * s

    @staticmethod
    def backward(ctx, grad_output):
        s, a, k = ctx.saved_tensors
        grad_s = None
        grad_s = k + s * (a - 1) * (k ** 2)

        return grad_output * grad_s, None

That looks better yes.

In the tutorial you linked, the formula is implemented properly there and uses grad_output.