I’m running the preprocessing pipeline online by using a Dataloader over a self-defined Dataset, which performs some pre-processing operations defined in the getitem function.
I observed a large memory utilization by the batches, so I was curious if it is the preprocessed batches that are not sent to the GPU are the ones responsible for this memory bloat. How do I explicitly signal to remove this batches from memory?
I know I can use the delete keyword and rely on the Automatic Garbage Collector to do this(Tried it out, doesn’t affect the huge memory footprint), but is there some torch API that can help me instantly remove a preprocessed batch from the memory?
Edit: Also, I’m not consuming the batches into an ML model on the GPU/CPU. A simple iterator over all batches without any further processing done on the emitted batches.