I'm trying to rewrite the CUDA cache memory allocator

Instead of checking the reserved memory (which includes the cache) and subtracting the reported memory by nvidia-smi I would just use torch.cuda.memory_allocated which will return the used and allocated memory only.