Pytorch autograd hook in Megatron distributed data parallel

Hi everyone, just wondering why do we need to expand the tensor to get access to grad_fn as described here? Can we replace the expand_as with a view operation instead? Thanks in advance!