Will the cuda memory cache be helpful for loading the highly similar batch-data?

When we use the mini-batch for training, if the next batch-data is highly similar to the previous batch, will the cuda memory cache be helpful for loading it? (compared with the next batch-data which is totally different from the previous batch)

Pasting the answer from the PM:

The PyTorch caching allocator is only concerned with the re-usage of the device memory.
I.e. it it trying to avoid expensive cudaMalloc calls, which would be synchronizing, and thus keeps the memory allocated and tries to reuse it. If that’s not possible, new memory is allocated and added to the reserved pool.
You might be referring to the L1 or L2 memory cache, which is used as a faster memory pool on the device to avoid loading from the global GPU memory.

Thanks again for your reply!