How can I update the gradient manually once per epoch?

I am working on a regression problem involving a very sparse dataset (<1% non-zero examples). One of the strategies I would like to try is to accumulate the gradient for each batch over an entire epoch, and only then update the model (since a batch has many zero examples and the gradients quickly tend to zero). I am thinking of a pseudocode similar to the following:

for e in epochs:
   grad = 0
   for batch in dataloader:
      grad += model.backward(batch)
   grad /= len(dataloader)

My current code that updates the gradient after every batch is:

    for e in epochs:
       for batch in dataloader:

            # Forward pass

            x: torch.tensor = batch[0].float().to(device)
            y: torch.tensor = batch[1].float().to(device)

            y_pred = model_reg(x).squeeze()

            loss = criterion(y_pred, y)

            # Backward pass

Is there a way I could do what I am thinking of in Pytorch? I know autograd is a bit special, so I wanted to seek your wisdom in these matters. My current idea was to have the optimizer.zero_grad() and optimizer.step() being done outside of the batch loops (so done only once per epoch). However, I am not sure how I could divide the accumulated gradient at the end of an epoch by the number of batches.

I have been looking through the documentation and past topics and couldn’t find an answer. Any advice or help are greatly appreciated!


This post gives you a few examples of gradient accumulation approaches and their advantages or shortcomings.
Based on your current code snippet it seems you would like to use torch.autograd,grad instead, which I think should work in a similar manner (you might need to perform the gradient accumulation manually though).