How to model GD without having memory issues

ptrblck · April 19, 2021, 7:15am

That’s not the case, since each tensor chunk would also store the entire computation graph, so you would use the same memory in the end.
You could simulate the larger batch size using one of the approaches described in this post.