How to interpret the output of CommDebugMode

How should the following be interpreted ?

FORWARD PASS
*c10d_functional.all_reduce: 2
**aten._foreach_norm.Scalar
**aten._foreach_norm.Scalar
**aten.stack.default
**aten.div.Tensor
**aten.div.Tensor
**aten.div.Tensor
**aten.div.Tensor
**aten.stack.default
**aten.linalg_vector_norm.default
shape: [torch.Size([13])]
sharding: [(_NormPartial(reduce_op=‘sum’, norm_type=1.0), _NormPartial(reduce_op=‘sum’, norm_type=1.0))]
device mesh: DeviceMesh(‘cuda’, [[0, 1], [2, 3]], mesh_dim_names=(‘dp’, ‘tp’))
**aten.linalg_vector_norm.default
**aten.add.Tensor
shape: [torch.Size()]
sharding: [(_NormPartial(reduce_op=‘sum’, norm_type=1.0), _NormPartial(reduce_op=‘sum’, norm_type=1.0))]
device mesh: DeviceMesh(‘cuda’, [[0, 1], [2, 3]], mesh_dim_names=(‘dp’, ‘tp’))
**aten.add.Tensor
**aten.reciprocal.default
shape: [torch.Size()]
sharding: [(_NormPartial(reduce_op=‘sum’, norm_type=1.0), _NormPartial(reduce_op=‘sum’, norm_type=1.0))]
device mesh: DeviceMesh(‘cuda’, [[0, 1], [2, 3]], mesh_dim_names=(‘dp’, ‘tp’))
**_c10d_functional.all_reduce.default
**_c10d_functional.all_reduce.default
**_c10d_functional.wait_tensor.default
**aten.reciprocal.default
**aten.mul.Tensor
shape: [torch.Size()]
sharding: [(Replicate(), Replicate())]
device mesh: DeviceMesh(‘cuda’, [[0, 1], [2, 3]], mesh_dim_names=(‘dp’, ‘tp’))
**aten.mul.Tensor
**aten.clamp.default
shape: [torch.Size()]
sharding: [(Replicate(), Replicate())]
device mesh: DeviceMesh(‘cuda’, [[0, 1], [2, 3]], mesh_dim_names=(‘dp’, ‘tp’))
**aten.clamp.default
**aten.foreach_mul.Tensor
shape: [torch.Size()]
sharding: [(Replicate(), Replicate())]
device mesh: DeviceMesh(‘cuda’, [[0, 1], [2, 3]], mesh_dim_names=(‘dp’, ‘tp’))
**aten.foreach_mul.Tensor