Why does gradcheck fail for floats?

I have implemented by own convolution layer as a learning experience in extending PyTorch. Looking at the printouts for small tensors, my outputs and gradients appear to match those of the native PyTorch convolutions.

When I go to much larger tensors, I have to rely on an sum of absolute differences metric between my implementation’s outputs/grads and the builtin’s ones. I’ve found that while my forward pass outputs and my gradients w.r.t the inputs match the native implementation every time, my gradients w.r.t the weights and biases are often slightly off. The per-element error is fairly small for small tensors (<1e-5), but as the tensor size increases to something like a (1, 512, 128, 256) tensor, I see per-element avg. errors close to 1e-2 or higher.

When I defined all the input and grad output (i.e. the argument to backward) elements to be whole numbers, the error disappears. Hence I thought it was a precision issue.

Sure enough, my implementation passes gradcheck when all values are whole number floats or doubles. It fails (which gradcheck warns will happen), when using decimal floats. Why does this happen?

Weirder still is that the errors when using floats are often the same set values but move around the returned tensor depending on the run (e.g. all the SAD errors are 0.002 or 0.0039 with one set of convolution parameters, or 0.0625, 0.125 and 0.25 with another). I’d thought I was accidentally accessing uninitialized memory in my cuBLAS gemm call, but switching to doubles largely resolves the issue.

Am I looking at a precision issue? If so, why does it only affect weight and bias gradients and not input gradients?

Yeah, 32 bit floating point usually doesn’t have enough precision to pass numerical gradient checks.

But I’m checking the gradients for float32 datatypes, so is there like a minimal code changes to the gradcheck and gradgradcheck function API to let it be a proper test API for float32?