I have some functions that require storing error values at a current layer as they are passed through. For example if the nonlinearity of a layer is sigmoid the line of code looks like:

I am looking to do the same thing for the top layer which would require functions for the actual forward and backward steps of what is happening within loss functions. Questions 1 is if there is any generic way to do this or a place I can look within the pytoch definitions for these functions. 2 is, if not, if anyone can help me write what these might look like for some common loss functions like MSE and CrossEntropy?

Sigmoid prime is the derivative of the sigmoid function to calculate the gradient during the backward pass if autograd wasn’t doing it already under the hood.

The module just has a conv2D or Linear layer and then a sigmoid for the earlier layers and it seems to be doing what I want it to do. But at the top layer where it is just a Linear into the loss function, I do not do this step and just use grad_output it does not appear to be working.

But if you want the forward and backward functions, you just need to differentiate the functions.
For example, you have mse that does this in the forward: mse(x, y) = (x - y).norm(2).mean()
Then you can differentiate wrt to x and do mse_backward_x(grad_out, x, y) = 2/N * grad_out.expand_as(x) * (x - y)