That’s not the case, since each tensor chunk would also store the entire computation graph, so you would use the same memory in the end.
You could simulate the larger batch size using one of the approaches described in this post.
That’s not the case, since each tensor chunk would also store the entire computation graph, so you would use the same memory in the end.
You could simulate the larger batch size using one of the approaches described in this post.