Suggestions for making this equation autograd-able?

You are right, just using cost = cost + … instead of += is the way to go.
I am facing convergence issues but that is probably related to the nature of the problem. I can see a nonzero gradient flow through the network, therefore the loss as I coded it should be able to compute the gradient automatically. This seems to me quite sound… (PS: the plot is done according to this old discussion Check gradient flow in network - #7 by RoshanRane)