Cache Memory Allocation

Wondering what can cause increase in cache memory?
In my code, torch.cuda.max_memory_allocated(0) shows only 3GB but nvidia-smi shows 10GB. I expected some cache but not 3 times the real memory. How should I investigate and reduce that?
I think empty_cache will slow down the algorithm, right?

empty_cache will force PyTorch to reallocate the memory, if necessary, and thus might slow down the code.
The large cache might be created during the training, as the forward pass will create intermediate activations, which could be needed for the gradient calculation during the backward pass.
E.g. a simple nn.Conv2d might create an activation, which uses a multiple of the memory footprint of the input tensor, if the number of out_channels is large.

I understand.
My problem is that I am deleting a lot of activation and other tensors after using them in my code and when their are not used any more but their memory also goes into the cache. I don’t want to use empty_cache() since as you said it will reallocate the memory needed for the model and slows down the code. But is there a way to avoid deleted tensor’s memory go into the cache?

another example is cloning a tensor which creates cache!

Why would you like to avoid creating the cache?
This would force PyTorch to reallocate device memory, which will synchronize the code and thus might introduce a potential performance hit.

I can have 2x to 4x increase in batch size if I don’t have cache for those deleted tensors. So probably I can gain performance there.

How did you increase the batch size by removing the cache?
Note that the memory stored in the cache can be reused.
If you empty the cache, PyTorch would have to reallocate the memory.
Freeing the cache should not save memory, but just slow down your code due to free and alloc calls.