Why will GPU memory consumption decrease when using torch.cuda.memory_allocated?

wangyuyy · November 15, 2023, 7:53am

I defined a torch.autograd.Funcion module and link it with a cuda module. Strange thing is, when I added logger to report the current memory consumption of the GPU, the memory consumption largely decreased. The only difference of the two experiments is whether to report the logger, which is as follows:

def forward(...):
....
        logger.info(f"memory before rasterization: {torch.cuda.memory_allocated()/1024**2:,.02f} MB")
        # Invoke C++/CUDA rasterizer
        num_rendered, color, depth_map, radii, geomBuffer, binningBuffer, imgBuffer = _C.rasterize_gaussians(*args)
        logger.info(f"memory after rasterization: {torch.cuda.memory_allocated()/1024**2:,.02f} MB")

...
def backward(...):
....
        logger.info(f"memory before backward: {torch.cuda.memory_allocated()/1024**2:,.02f} MB")

        # Compute gradients for relevant tensors by invoking backward method
        grad_means2D, grad_colors_precomp, grad_depths_precomp, grad_opacities, grad_means3D, grad_cov3Ds_precomp, grad_sh, grad_scales, grad_rotations = _C.rasterize_gaussians_backward(*args)

        logger.info(f"memory after backward: {torch.cuda.memory_allocated()/1024**2:,.02f} MB")

if I report the logger, the GPU comsumption is
else if I don’t report the logger, the GPU comsumption is

I swear the only difference between the two codes are above. It’s really strange because torch.cuda.memory_allocated should not influence the comsumption of the GPU. The peak memory info of the progress bar is allocated at the same place which is still not changed between two exps.

updated exp results:
I did more experiments.

I deleted the loguru.logger report, only to define an variable to query the memory consumption like:
memory_before_rasterization = torch.cuda.memory_allocated(). The result remains the same. Low consumption when defined, while high consumption when not defined.
I deleted the part of the backward, while remain the forward part. The result remain the same.
I deleted the part of the forward, only to find that the GPU consumption increased just like showed before.
The GPU consumption is at low range only when the function torch.cuda.memory_allocated is both called before and after the _C.rasterize_gaussians(*args).

I wonder whether somewhat the query of the GPU comsumption will influence the memory comsumption?