Gradcheck works on functions indepedently, but not in sequence

I’ve implemented some custom functions in CUDA. They pass gradcheck individually, but when I compose them, they fail. Is this expected behavior? I can’t imagine it would be. The structure of my function is:

import torch
from my_src_code import my_backend1, my_backend2

class MyFunction(torch.autograd.Function):
	def forward(self, input):
		out = my_backend1.forward(input)
		return my_backend2.forward(out)

	def backward(self, grad_output):
		input = self.saved_tensors

		grad_output = my_backend2.backward(grad_output)
		grad_input = my_backend1.backward(
		return grad_input

class MyLayer(nn.Module):

	def __init__(self):
		super(MyLayer, self).__init__()
		# ...

	def forward(self, x):
		return MyFunction.apply(x)

Both my_backend1 and my_backend2 pass gradcheck on their own.

Are you using float64 tensors for gradcheck?
Using float32 might fail due to floating point precision issues.

1 Like

As it turns out I was initializing the returned tensor as a float, and because of the kernel launch macro for floating points, I didn’t realize I was actually doing some implicit conversion in the kernel. This became clear when I tested with actual decimal data instead of integers.

It’s always something simple…