I am currently working on tensor-aware energy accounting
: https://dl.acm.org/doi/10.1145/3597503.3639156 for machine learning models. The goal is to attribute energy consumption to specific tensors within the model.
I was able to generate the trace , able to get the forward, backward flow segments , flow to GPU but not able to determine which events belong which tensor operations.
For example, Gvien
...
{
"ph": "X", "cat": "cpu_op", "name": "autograd::engine::evaluate_function: ViewBackward0", "pid": 112835, "tid": 112835,
"ts": 5284315312688.329, "dur": 21.660,
"args": {
"External id": 885,"Record function id": 0, "Sequence number": 125, "Fwd thread id": 1, "Ev Idx": 884
}
},
{
"ph": "X", "cat": "cpu_op", "name": "ViewBackward0", "pid": 112835, "tid": 112835,
"ts": 5284315312690.610, "dur": 17.072,
"args": {
"External id": 886,"Record function id": 0, "Sequence number": 125, "Fwd thread id": 1, "Ev Idx": 885
}
}
...
how do I determine to which viewbackward operation in the whole model does the slice with external id 886 belong to?
I am guessing determining the flow of data would help here but I find it challenging to trace the flow of data through the model solely using Chrome tracing tools. Has anyone encountered a similar issue or have suggestions on how to better understand and determine dataflow just based on the chrome trace.
Any insights or recommendations would be greatly appreciated.