Multiple forward before backward call

Similar to this post, I would like to do multiple toward pass before backward, but on the same model, Effectively simulating a very large batch-size (like Gradient decent) which would not fit in any GPU.

more concretely,
is it possible to somehow accumulate the gradient information and perform GD on large dataset using pytorch and if it is possible, how can I do it?

Many thanks!

suppose you forward pass on 6 datapoints (6 images) and you want to perform GD on 36 datapoints (images)

Let dataloader load input with dimension [6x3x224x224] then

for i,(input,target) in enumerate(dataloader):
    output = model(input)
    loss = lossfn(output, target)
    loss.backward() #Only stores the gradients at all nodes
    if (i+1)%6==0:
        loss = loss / 6 # Since we add up loss for 6 minibatches, we would takes mean of loss at end
        optim.step() #Uses the gradients to backpropogagte after 6 batches
        optim.zero_grad()

2 Likes

Turns out i was searching with the wrong keyword,
already discussed
here and here

@Naman-ntc
Also read this on other post, a huge thank you!

1 Like