How to model GD without having memory issues

That’s not the case, since each tensor chunk would also store the entire computation graph, so you would use the same memory in the end.
You could simulate the larger batch size using one of the approaches described in this post.