what are the steps to manually calculate the backpropagation gradient with the architecture that I mentioned? because I’m confused, the architecture on google regarding backprop is different from the neural network architecture that I use, I’m confused about the linear layer that doesn’t use the activation function and how to calculate the gradient on the batch norm with its derivative function. I’ve tried to calculate the gradient loss to output, here the loss I use is bcewithlogitsloss, then I try to calculate the linear layer by multiplying the output of the previous layer by the gradient loss to output, and starting here I feel wrong. the output I want is a gradient value that I can use to update the weights with the adam optimizer