Weird Profiler CPU Memory Deallocation

Hello,

I’m trying to measure the CPU memory allocated for an application but I witnessed something strange. Here’s a minimum working example:

import torch
from torch import nn
from torch.profiler import profile, ProfilerActivity

class SimpleModule(nn.Module):
    def __init__(
        self,
        dim: int,
    ):
        super().__init__()
        self.dim = dim
        self.op = nn.Linear(dim, dim, bias=False)

    def forward(
        self,
        input1: torch.Tensor,
        input2: torch.Tensor,
    ):
        output1 = torch.empty([self.dim, self.dim])
        torch.mm(input1, input2, out=output1)
        output = self.op(output1)
        return output


for dim in [2048, 4096, 1024]:
    module = SimpleModule(dim)
    module.eval()
    input_tensor1 = torch.rand(dim, dim) * 1.e-2
    input_tensor2 = torch.rand(dim, dim) * 1.e-2
    with profile(activities=[ProfilerActivity.CPU], profile_memory=True, record_shapes=True) as prof:
        _ = module(input_tensor1, input_tensor2)
    print(prof.key_averages().table(sort_by="self_cpu_memory_usage", row_limit=100))

The output looks like this:

STAGE:2023-08-01 11:33:58 46581:46581 ActivityProfilerController.cpp:311] Completed Stage: Warm Up
STAGE:2023-08-01 11:33:58 46581:46581 ActivityProfilerController.cpp:317] Completed Stage: Collection
STAGE:2023-08-01 11:33:58 46581:46581 ActivityProfilerController.cpp:321] Completed Stage: Post Processing
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                  Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem    # of Calls  
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
           aten::empty         0.01%      21.000us         0.01%      21.000us      21.000us      16.00 Mb      16.00 Mb             1  
              aten::mm        99.89%     157.283ms        99.89%     157.283ms      78.641ms      16.00 Mb      16.00 Mb             2  
    aten::resolve_conj         0.00%       0.000us         0.00%       0.000us       0.000us           0 b           0 b             5  
          aten::linear         0.04%      58.000us        43.73%      68.859ms      68.859ms      16.00 Mb           0 b             1  
               aten::t         0.03%      45.000us         0.05%      86.000us      86.000us           0 b           0 b             1  
       aten::transpose         0.02%      34.000us         0.03%      41.000us      41.000us           0 b           0 b             1  
      aten::as_strided         0.00%       7.000us         0.00%       7.000us       7.000us           0 b           0 b             1  
          aten::matmul         0.01%      10.000us        43.64%      68.715ms      68.715ms      16.00 Mb           0 b             1  
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 157.458ms

STAGE:2023-08-01 11:33:58 46581:46581 ActivityProfilerController.cpp:311] Completed Stage: Warm Up
[W CPUAllocator.cpp:235] Memory block of unknown size was allocated before the profiling started, profiler results will not include the deallocation event
STAGE:2023-08-01 11:33:59 46581:46581 ActivityProfilerController.cpp:317] Completed Stage: Collection
STAGE:2023-08-01 11:33:59 46581:46581 ActivityProfilerController.cpp:321] Completed Stage: Post Processing
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                  Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem    # of Calls  
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
           aten::empty         0.00%      28.000us         0.00%      28.000us      28.000us      64.00 Mb      64.00 Mb             1  
              aten::mm        99.99%        1.400s        99.99%        1.400s     700.049ms      64.00 Mb      64.00 Mb             2  
    aten::resolve_conj         0.00%       3.000us         0.00%       3.000us       0.600us           0 b           0 b             5  
          aten::linear         0.00%      18.000us        52.85%     740.019ms     740.019ms      64.00 Mb           0 b             1  
               aten::t         0.00%      37.000us         0.00%      65.000us      65.000us           0 b           0 b             1  
       aten::transpose         0.00%      19.000us         0.00%      28.000us      28.000us           0 b           0 b             1  
      aten::as_strided         0.00%       9.000us         0.00%       9.000us       9.000us           0 b           0 b             1  
          aten::matmul         0.00%      13.000us        52.84%     739.936ms     739.936ms      64.00 Mb           0 b             1  
              [memory]         0.00%       0.000us         0.00%       0.000us       0.000us     -32.00 Mb     -32.00 Mb             2  
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 1.400s

STAGE:2023-08-01 11:34:00 46581:46581 ActivityProfilerController.cpp:311] Completed Stage: Warm Up
STAGE:2023-08-01 11:34:00 46581:46581 ActivityProfilerController.cpp:317] Completed Stage: Collection
STAGE:2023-08-01 11:34:00 46581:46581 ActivityProfilerController.cpp:321] Completed Stage: Post Processing
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                  Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem    # of Calls  
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
           aten::empty         0.04%      11.000us         0.04%      11.000us      11.000us       4.00 Mb       4.00 Mb             1  
              aten::mm        99.60%      24.428ms        99.60%      24.428ms      12.214ms       4.00 Mb       4.00 Mb             2  
    aten::resolve_conj         0.00%       0.000us         0.00%       0.000us       0.000us           0 b           0 b             5  
          aten::linear         0.07%      17.000us        42.36%      10.388ms      10.388ms       4.00 Mb           0 b             1  
               aten::t         0.15%      37.000us         0.22%      55.000us      55.000us           0 b           0 b             1  
       aten::transpose         0.05%      13.000us         0.07%      18.000us      18.000us           0 b           0 b             1  
      aten::as_strided         0.02%       5.000us         0.02%       5.000us       5.000us           0 b           0 b             1  
          aten::matmul         0.06%      14.000us        42.06%      10.316ms      10.316ms       4.00 Mb           0 b             1  
              [memory]         0.00%       0.000us         0.00%       0.000us       0.000us    -128.00 Mb    -128.00 Mb             2  
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 24.525ms

And here are the output of running these three tests separately.

2048:

STAGE:2023-08-01 11:37:04 46830:46830 ActivityProfilerController.cpp:311] Completed Stage: Warm Up
STAGE:2023-08-01 11:37:05 46830:46830 ActivityProfilerController.cpp:317] Completed Stage: Collection
STAGE:2023-08-01 11:37:05 46830:46830 ActivityProfilerController.cpp:321] Completed Stage: Post Processing
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                  Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem    # of Calls  
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
           aten::empty         0.01%      25.000us         0.01%      25.000us      25.000us      16.00 Mb      16.00 Mb             1  
              aten::mm        99.90%     171.833ms        99.90%     171.834ms      85.917ms      16.00 Mb      16.00 Mb             2  
    aten::resolve_conj         0.00%       1.000us         0.00%       1.000us       0.200us           0 b           0 b             5  
          aten::linear         0.03%      51.000us        58.07%      99.887ms      99.887ms      16.00 Mb           0 b             1  
               aten::t         0.03%      48.000us         0.05%      85.000us      85.000us           0 b           0 b             1  
       aten::transpose         0.02%      32.000us         0.02%      37.000us      37.000us           0 b           0 b             1  
      aten::as_strided         0.00%       5.000us         0.00%       5.000us       5.000us           0 b           0 b             1  
          aten::matmul         0.01%      10.000us        57.99%      99.751ms      99.751ms      16.00 Mb           0 b             1  
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 172.005ms

4096:

STAGE:2023-08-01 11:38:18 46994:46994 ActivityProfilerController.cpp:311] Completed Stage: Warm Up
STAGE:2023-08-01 11:38:19 46994:46994 ActivityProfilerController.cpp:317] Completed Stage: Collection
STAGE:2023-08-01 11:38:19 46994:46994 ActivityProfilerController.cpp:321] Completed Stage: Post Processing
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                  Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem    # of Calls  
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
           aten::empty         0.00%      14.000us         0.00%      14.000us      14.000us      64.00 Mb      64.00 Mb             1  
              aten::mm        99.98%        1.426s        99.98%        1.426s     713.101ms      64.00 Mb      64.00 Mb             2  
    aten::resolve_conj         0.00%       5.000us         0.00%       5.000us       1.000us           0 b           0 b             5  
          aten::linear         0.01%     100.000us        51.83%     739.413ms     739.413ms      64.00 Mb           0 b             1  
               aten::t         0.01%      82.000us         0.01%     143.000us     143.000us           0 b           0 b             1  
       aten::transpose         0.00%      45.000us         0.00%      61.000us      61.000us           0 b           0 b             1  
      aten::as_strided         0.00%      16.000us         0.00%      16.000us      16.000us           0 b           0 b             1  
          aten::matmul         0.00%      16.000us        51.82%     739.170ms     739.170ms      64.00 Mb           0 b             1  
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 1.426s

1024:

STAGE:2023-08-01 11:38:45 47144:47144 ActivityProfilerController.cpp:311] Completed Stage: Warm Up
STAGE:2023-08-01 11:38:45 47144:47144 ActivityProfilerController.cpp:317] Completed Stage: Collection
STAGE:2023-08-01 11:38:45 47144:47144 ActivityProfilerController.cpp:321] Completed Stage: Post Processing
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                  Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem    # of Calls  
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
           aten::empty         0.06%      15.000us         0.06%      15.000us      15.000us       4.00 Mb       4.00 Mb             1  
              aten::mm        99.36%      23.708ms        99.39%      23.714ms      11.857ms       4.00 Mb       4.00 Mb             2  
    aten::resolve_conj         0.03%       6.000us         0.03%       6.000us       1.200us           0 b           0 b             5  
          aten::linear         0.19%      45.000us        38.24%       9.125ms       9.125ms       4.00 Mb           0 b             1  
               aten::t         0.20%      48.000us         0.32%      77.000us      77.000us           0 b           0 b             1  
       aten::transpose         0.10%      25.000us         0.12%      29.000us      29.000us           0 b           0 b             1  
      aten::as_strided         0.02%       4.000us         0.02%       4.000us       4.000us           0 b           0 b             1  
          aten::matmul         0.04%       9.000us        37.73%       9.003ms       9.003ms       4.00 Mb           0 b             1  
----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 23.860ms

My question is what exactly does this [memory] line with negative value mean? From the aggregated result it seems like the 2nd and 3rd tests were deallocating the memory allocated in the previous tests (1st and 2nd respectively). How do we eliminate the effect of it from the previous run?

Also, in my real application this [memory] line occurs for every standalone test, I wonder if we can use this value to indicate the memory allocated if the above mentioned effect is eliminated? Or we’d better sum up all positive values in the “Self CPU Mem” column?

Thanks in advance!