I was trying to calculate per sample gradient in a naive way i.e. got through one data at a time. Each data is passed through the model and the loss is calculated and backpropagated. Then I looped through each layer and added the gradient as layer.weight.grad.flatten(). The problem I am facing is that for each data the gradient is greater than the previous data. I tried to shuffle the data in a small subset and check if the gradients match but it’s the same case there too i.e. gradient keeps on increasing as I go through the data one at a time. Could anyone provide some insight to why that could be?

If you are using a standard optimizer are you calling `optimizer.zero_grad()`

to zero out the gradients before each step? Example: examples/main.py at cbb760d5e50a03df667cdc32a61f75ac28e11cbf · pytorch/examples · GitHub

Otherwise they will continue to accumulate with each update.

I wasn’t using optimizer but I cleared the grad for each layer’s weights which solved the problem. I have a doubt if optimization based on the gradient would work or not because I cleared the gradient for the weights.