DTensor TP collectives missing?

This doc and the paper on TP mentions the following.

Figure 3. Blocks of Transformer with Model Parallelism. f and g are conjugate. f is an identity operator in the forward pass and all reduce in the backward pass while g is an allreduce in the forward pass and identity in the backward pass.

However, running this TP example of MLP, I can only see 1 allreduce corresponding to the forward pass. Do not see any allreduce corresponding to the backward pass. Can someone clarify what is happening in the DTensor TP ?