Hello. I have implemented in two different ways my own loss function, which involves computing a (one-dimensional) integral. The. first one is using “torch.trapz” to compute the integral and the other one is computing the integral exactly in closed-form.
I have done a lot of experiments to confirm that the two implementations are correct and do return the same results. Essentially, there are just tiny numerical approximation differences, for example, 15020.1720 instead of 15020.1724. In other words, we can assume that the two implementations are identical.
One thing I do not understand is why I get (very) different gradients for my parameters (using backwards on my loss). I was expecting to get (almost) the same gradients given that the two functions compute the same quantity.
Am I missing something? I know it is hard to say without seeing the code, but what would be the typical error for a non-expert in this case? What should I look at?