Hello,
I’m trying to measure the CPU memory allocated for an application but I witnessed something strange. Here’s a minimum working example:
import torch
from torch import nn
from torch.profiler import profile, ProfilerActivity
class SimpleModule(nn.Module):
def __init__(
self,
dim: int,
):
super().__init__()
self.dim = dim
self.op = nn.Linear(dim, dim, bias=False)
def forward(
self,
input1: torch.Tensor,
input2: torch.Tensor,
):
output1 = torch.empty([self.dim, self.dim])
torch.mm(input1, input2, out=output1)
output = self.op(output1)
return output
for dim in [2048, 4096, 1024]:
module = SimpleModule(dim)
module.eval()
input_tensor1 = torch.rand(dim, dim) * 1.e-2
input_tensor2 = torch.rand(dim, dim) * 1.e-2
with profile(activities=[ProfilerActivity.CPU], profile_memory=True, record_shapes=True) as prof:
_ = module(input_tensor1, input_tensor2)
print(prof.key_averages().table(sort_by="self_cpu_memory_usage", row_limit=100))
The output looks like this:
STAGE:2023-08-01 11:33:58 46581:46581 ActivityProfilerController.cpp:311] Completed Stage: Warm Up
STAGE:2023-08-01 11:33:58 46581:46581 ActivityProfilerController.cpp:317] Completed Stage: Collection
STAGE:2023-08-01 11:33:58 46581:46581 ActivityProfilerController.cpp:321] Completed Stage: Post Processing
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem # of Calls
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
aten::empty 0.01% 21.000us 0.01% 21.000us 21.000us 16.00 Mb 16.00 Mb 1
aten::mm 99.89% 157.283ms 99.89% 157.283ms 78.641ms 16.00 Mb 16.00 Mb 2
aten::resolve_conj 0.00% 0.000us 0.00% 0.000us 0.000us 0 b 0 b 5
aten::linear 0.04% 58.000us 43.73% 68.859ms 68.859ms 16.00 Mb 0 b 1
aten::t 0.03% 45.000us 0.05% 86.000us 86.000us 0 b 0 b 1
aten::transpose 0.02% 34.000us 0.03% 41.000us 41.000us 0 b 0 b 1
aten::as_strided 0.00% 7.000us 0.00% 7.000us 7.000us 0 b 0 b 1
aten::matmul 0.01% 10.000us 43.64% 68.715ms 68.715ms 16.00 Mb 0 b 1
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 157.458ms
STAGE:2023-08-01 11:33:58 46581:46581 ActivityProfilerController.cpp:311] Completed Stage: Warm Up
[W CPUAllocator.cpp:235] Memory block of unknown size was allocated before the profiling started, profiler results will not include the deallocation event
STAGE:2023-08-01 11:33:59 46581:46581 ActivityProfilerController.cpp:317] Completed Stage: Collection
STAGE:2023-08-01 11:33:59 46581:46581 ActivityProfilerController.cpp:321] Completed Stage: Post Processing
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem # of Calls
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
aten::empty 0.00% 28.000us 0.00% 28.000us 28.000us 64.00 Mb 64.00 Mb 1
aten::mm 99.99% 1.400s 99.99% 1.400s 700.049ms 64.00 Mb 64.00 Mb 2
aten::resolve_conj 0.00% 3.000us 0.00% 3.000us 0.600us 0 b 0 b 5
aten::linear 0.00% 18.000us 52.85% 740.019ms 740.019ms 64.00 Mb 0 b 1
aten::t 0.00% 37.000us 0.00% 65.000us 65.000us 0 b 0 b 1
aten::transpose 0.00% 19.000us 0.00% 28.000us 28.000us 0 b 0 b 1
aten::as_strided 0.00% 9.000us 0.00% 9.000us 9.000us 0 b 0 b 1
aten::matmul 0.00% 13.000us 52.84% 739.936ms 739.936ms 64.00 Mb 0 b 1
[memory] 0.00% 0.000us 0.00% 0.000us 0.000us -32.00 Mb -32.00 Mb 2
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 1.400s
STAGE:2023-08-01 11:34:00 46581:46581 ActivityProfilerController.cpp:311] Completed Stage: Warm Up
STAGE:2023-08-01 11:34:00 46581:46581 ActivityProfilerController.cpp:317] Completed Stage: Collection
STAGE:2023-08-01 11:34:00 46581:46581 ActivityProfilerController.cpp:321] Completed Stage: Post Processing
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem # of Calls
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
aten::empty 0.04% 11.000us 0.04% 11.000us 11.000us 4.00 Mb 4.00 Mb 1
aten::mm 99.60% 24.428ms 99.60% 24.428ms 12.214ms 4.00 Mb 4.00 Mb 2
aten::resolve_conj 0.00% 0.000us 0.00% 0.000us 0.000us 0 b 0 b 5
aten::linear 0.07% 17.000us 42.36% 10.388ms 10.388ms 4.00 Mb 0 b 1
aten::t 0.15% 37.000us 0.22% 55.000us 55.000us 0 b 0 b 1
aten::transpose 0.05% 13.000us 0.07% 18.000us 18.000us 0 b 0 b 1
aten::as_strided 0.02% 5.000us 0.02% 5.000us 5.000us 0 b 0 b 1
aten::matmul 0.06% 14.000us 42.06% 10.316ms 10.316ms 4.00 Mb 0 b 1
[memory] 0.00% 0.000us 0.00% 0.000us 0.000us -128.00 Mb -128.00 Mb 2
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 24.525ms
And here are the output of running these three tests separately.
2048:
STAGE:2023-08-01 11:37:04 46830:46830 ActivityProfilerController.cpp:311] Completed Stage: Warm Up
STAGE:2023-08-01 11:37:05 46830:46830 ActivityProfilerController.cpp:317] Completed Stage: Collection
STAGE:2023-08-01 11:37:05 46830:46830 ActivityProfilerController.cpp:321] Completed Stage: Post Processing
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem # of Calls
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
aten::empty 0.01% 25.000us 0.01% 25.000us 25.000us 16.00 Mb 16.00 Mb 1
aten::mm 99.90% 171.833ms 99.90% 171.834ms 85.917ms 16.00 Mb 16.00 Mb 2
aten::resolve_conj 0.00% 1.000us 0.00% 1.000us 0.200us 0 b 0 b 5
aten::linear 0.03% 51.000us 58.07% 99.887ms 99.887ms 16.00 Mb 0 b 1
aten::t 0.03% 48.000us 0.05% 85.000us 85.000us 0 b 0 b 1
aten::transpose 0.02% 32.000us 0.02% 37.000us 37.000us 0 b 0 b 1
aten::as_strided 0.00% 5.000us 0.00% 5.000us 5.000us 0 b 0 b 1
aten::matmul 0.01% 10.000us 57.99% 99.751ms 99.751ms 16.00 Mb 0 b 1
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 172.005ms
4096:
STAGE:2023-08-01 11:38:18 46994:46994 ActivityProfilerController.cpp:311] Completed Stage: Warm Up
STAGE:2023-08-01 11:38:19 46994:46994 ActivityProfilerController.cpp:317] Completed Stage: Collection
STAGE:2023-08-01 11:38:19 46994:46994 ActivityProfilerController.cpp:321] Completed Stage: Post Processing
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem # of Calls
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
aten::empty 0.00% 14.000us 0.00% 14.000us 14.000us 64.00 Mb 64.00 Mb 1
aten::mm 99.98% 1.426s 99.98% 1.426s 713.101ms 64.00 Mb 64.00 Mb 2
aten::resolve_conj 0.00% 5.000us 0.00% 5.000us 1.000us 0 b 0 b 5
aten::linear 0.01% 100.000us 51.83% 739.413ms 739.413ms 64.00 Mb 0 b 1
aten::t 0.01% 82.000us 0.01% 143.000us 143.000us 0 b 0 b 1
aten::transpose 0.00% 45.000us 0.00% 61.000us 61.000us 0 b 0 b 1
aten::as_strided 0.00% 16.000us 0.00% 16.000us 16.000us 0 b 0 b 1
aten::matmul 0.00% 16.000us 51.82% 739.170ms 739.170ms 64.00 Mb 0 b 1
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 1.426s
1024:
STAGE:2023-08-01 11:38:45 47144:47144 ActivityProfilerController.cpp:311] Completed Stage: Warm Up
STAGE:2023-08-01 11:38:45 47144:47144 ActivityProfilerController.cpp:317] Completed Stage: Collection
STAGE:2023-08-01 11:38:45 47144:47144 ActivityProfilerController.cpp:321] Completed Stage: Post Processing
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem # of Calls
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
aten::empty 0.06% 15.000us 0.06% 15.000us 15.000us 4.00 Mb 4.00 Mb 1
aten::mm 99.36% 23.708ms 99.39% 23.714ms 11.857ms 4.00 Mb 4.00 Mb 2
aten::resolve_conj 0.03% 6.000us 0.03% 6.000us 1.200us 0 b 0 b 5
aten::linear 0.19% 45.000us 38.24% 9.125ms 9.125ms 4.00 Mb 0 b 1
aten::t 0.20% 48.000us 0.32% 77.000us 77.000us 0 b 0 b 1
aten::transpose 0.10% 25.000us 0.12% 29.000us 29.000us 0 b 0 b 1
aten::as_strided 0.02% 4.000us 0.02% 4.000us 4.000us 0 b 0 b 1
aten::matmul 0.04% 9.000us 37.73% 9.003ms 9.003ms 4.00 Mb 0 b 1
---------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 23.860ms
My question is what exactly does this [memory] line with negative value mean? From the aggregated result it seems like the 2nd and 3rd tests were deallocating the memory allocated in the previous tests (1st and 2nd respectively). How do we eliminate the effect of it from the previous run?
Also, in my real application this [memory] line occurs for every standalone test, I wonder if we can use this value to indicate the memory allocated if the above mentioned effect is eliminated? Or we’d better sum up all positive values in the “Self CPU Mem” column?
Thanks in advance!