LBFGS on dataset larger than memory

I want to perform optimization using LBFGS but my dataset is very large so I can only fit 1/32rd of it in memory.
I’m planning to split the dataset in 32 batches.
Unfortunately, with this approach LBFGS will get a different gradient every step but, I know that LBFGS requires a smooth gradient.
My idea is then to accumulate the gradients of all batches before calling optimizer.step(), thus simulating a full batch optimization.

The problem is that LBFGS requires a closure function where the gradient must be zeroed every time.
How can I accumulate the gradient and perform an optimizer.step only with the accumulated gradient?