Precision problem in gradient check of custom function

Hi, I implemented my custom function and use the gradcheck tool in pytorch to check whether there are implementation issues. While it did not pass the gradient checking because of some loss of precision.
Hadam
I set eps=1e-6, atol=1e-4. But I did not find the issue of my implementation.
Suggestions would be appreciated.

Edit: I post my code scripts here:

    @staticmethod
	def forward(ctx, input, weight, bias=None):
		ctx.save_for_backward(input, weight, bias)
		ctx.num = input.size(0)
		output = input.clone().zero_()
		for i in range(ctx.num):
			output[i,:] = torch.mul(input[i,:], weight)

		if bias is not None:
			output += bias

		return output
	@staticmethod	
	def backward(ctx, grad_output):
		input, weight, bias = ctx.saved_variables

		grad_input = grad_weight = grad_bias = None
			
		if ctx.needs_input_grad[0]:
			grad_input = grad_output.clone().zero_()
			for i in range(ctx.num):
				grad_input[i,:] = torch.mul(grad_output[i,:], weight)

		if ctx.needs_input_grad[1]:
			grad_weight = weight.clone().zero_()
			for i in range(ctx.num):
				grad_weight += torch.mul(grad_output[i,:], weight)

		if bias is not None and ctx.needs_input_grad[2]:
			grad_bias = torch.sum(grad_output, 0)

		if bias is not None:
			return grad_input, grad_weight, grad_bias
		else:
			return grad_input, grad_weight

What order of magnitude are the gradients? A general rule of thumb is to set eps to 1e-6 times the maximum magnitude of the gradient because floating-point precision is usually accurate to 6 decimal places.

Sorry that I know little about autograd. I do not know how to extract the gradient by gradcheck. Any suggested document about this?

The goal of my custom function is to calculate Hadamard(pointwise) product of input and weights, while the weights are parameters to be trained.
The scripts above can also be simplified as follows, I did not figure out my implementation errors.

    @staticmethod
	def forward(ctx, input, weight, bias=None):
		ctx.save_for_backward(input, weight, bias)

		output = torch.mul(input, weight)
		if bias is not None:
			output += bias

		return output
	@staticmethod	
	def backward(ctx, grad_output):
		input, weight, bias = ctx.saved_variables

		grad_input = grad_weight = grad_bias = None
		if ctx.needs_input_grad[0]:
			grad_input = torch.mul(grad_output, weight)
		if ctx.needs_input_grad[1]:
			grad_weight = torch.sum(grad_output*weight, 0)
		if bias is not None and ctx.needs_input_grad[2]:
			grad_bias = torch.sum(grad_output,0)

		if bias is not None:
			return grad_input, grad_weight, grad_bias
		else:
			return grad_input, grad_weight

@richard
I have figured out an implementation error:

grad_weight += torch.mul(grad_output[i,:], weight)

should be replaced by:

grad_weight += torch.mul(grad_output[i,:], input[i,:])

and correpondingly, the second version of scripts

grad_weight = torch.sum(grad_output * input, 0)

But it dit not pass the gradient checking again.