Can somebody help me understand the following output log generated using the autograd profiler, with memory profiling enabled. My specific questions are the following:
-
What’s the difference between CUDA Mem and Self CUDA Mem?
-
Why some of the memory stats negative (how to reason them)?
-
How to compute the total memory utilization (the total averages displayed at the bottom)?
Thanks in advance!
Name CUDA Mem Self CUDA Mem
aten::empty 3.06 Gb 3.06 Gb
aten::random_ 0 b 0 b
aten::is_floating_point 0 b 0 b
aten::item 0 b 0 b
aten::randperm 0 b 0 b
aten::randn 6.50 Kb 0 b
aten::randint 0 b 0 b
aten::select 0 b 0 b
aten::mul 3.00 Kb 0 b
aten::set_ 0 b 0 b
aten::view 0 b 0 b
aten::permute 0 b 0 b
aten::contiguous 0 b 0 b
aten::div 0 b 0 b
aten::stack 0 b 0 b
aten::zeros 0 b 0 b
Copy data to device 1.50 Mb 0 b
Forward D0 625.91 Mb -2.00 Kb
aten::binary_cross_entropy 1.50 Kb -1.50 Kb
Backward D0 1.00 Kb -1.00 Kb
MulBackward0 1.50 Kb 0 b
BinaryCrossEntropyBackward 1.50 Kb 0 b
SqueezeBackward1 0 b 0 b
ViewBackward 0 b 0 b
SigmoidBackward 1.50 Kb 0 b
CudnnConvolutionBackward 1.62 Gb 0 b
torch::autograd::CopyBackwards 2.45 Gb 0 b
torch::autograd::AccumulateGrad 846.79 Mb 0 b
LeakyReluBackward1 409.50 Mb 0 b
CudnnBatchNormBackward 225.16 Mb 0 b
Forward G0 22.58 Mb -770.00 Kb
Forward D1 625.16 Mb -2.50 Kb
Backward D1 0 b -1.00 Kb
Optimizer D 512 b -2.50 Kb
Forward D2 625.16 Mb -2.50 Kb
Backward G 0 b -1.00 Kb
TanhBackward 768.00 Kb 0 b
CudnnConvolutionTransposeBackward 14.32 Mb 0 b
ReluBackward1 7.50 Mb 0 b
Optimizer G 512 b -2.50 Kb
--------------------------------- -- ------------ ------------ ----
Self CPU time total: 11.786s
CUDA time total: 12.148s
<FunctionEventAvg key=Total self_cpu_time=11.786s cpu_time=9.369ms self_cuda_time=12.148s cuda_time=10.153ms input_shapes=[[1]] cpu_memory_usage=20845792 cuda_memory_usage=28655543808>