Yes, you are storing the complete computation graphs in the list, if you are not detaching the tensors.
It depends on your use case. If you need to store the computation graphs to call backward later, then you could reduce the number of iterations. Alternatively, if you don’t need to compute the gradients, you could store the tensors after calling detach() on them.
More of less. You would have to take e.g. memory fragmentation into consideration. Also, if you are using cudnn with benchmark=True, different algorithms might be picked for different batch sizes depending on their speed, so you might end up with a different memory footprint.