How to Compare Custom CUDA Gradients with PyTorch's Autograd Gradients?

Hi PyTorch Community,

I’m working on a project where I’ve implemented a custom autograd.Function in PyTorch that wraps a CUDA operation. My custom CUDA functions (both forward and backward) are critical for performance and currently support only float32 (torch.float) data types.

I want to verify that my custom backward implementation is correct by comparing the gradients computed by my custom backward function with the gradients computed by PyTorch’s autograd engine. However, I’m facing challenges due to the data type constraints and the use of custom CUDA code.

Here’s what I’m trying to achieve:

1. Include my custom CUDA function in the computational graph.
2. Use PyTorch’s autograd to compute gradients based on the forward pass.
3. Compare my custom backward gradients with the autograd-computed gradients to ensure correctness.

My Challenges:

  • Data Type Constraint: torch.autograd.gradcheck expects inputs of type torch.double, but my CUDA functions only support torch.float (float32). Modifying the CUDA code to support double would be a significant effort I’d like to avoid.
  • Gradient Checking: I’m unsure how to perform gradient checking in this scenario where my custom functions are limited to float32.

My Questions:

  1. Can a custom CUDA function be part of the computational graph, allowing PyTorch to compute autograd gradients based on the forward pass?
  2. If so, how can I extract PyTorch’s autograd gradients for comparison with my custom backward gradients?
  3. Is there a recommended approach or tool for performing gradient checking with custom CUDA functions that only support float32?

Illustrative Pseudo-Code:

Here’s a simplified pseudo-code to illustrate my setup:

import torch

class MyCustomFunction(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        # Call my custom CUDA forward function (supports float32 only)
        output = my_cuda_forward(input)
        ctx.save_for_backward(input)
        return output

    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        # Call my custom CUDA backward function
        grad_input = my_cuda_backward(grad_output, input)
        return grad_input

# Example usage
input = torch.randn(10, dtype=torch.float32, requires_grad=True)
output = MyCustomFunction.apply(input)
loss = output.sum()
loss.backward()

# Now, I want to compare my custom backward gradients with PyTorch's autograd gradients
# How can I achieve this?

Additional Context:

  • PyTorch Version: (2.5.1)
  • CUDA Version: (e.g., 12.1)
  • Operating System: (e.g., Ubuntu 22.04)

What I’ve Tried:

  • Using gradcheck with float32 tensors: gradcheck requires torch.double tensors and doesn’t accept float32 inputs.
  • Implementing Custom Gradient Checking: Considering writing a custom finite differences gradient checker, but I’m wondering if there’s a better approach.

Any guidance or suggestions would be greatly appreciated!

Thank you!