Pytorch autograd hook in Megatron distributed data parallel

Nguyen_Anh · December 22, 2021, 11:12pm

Hi everyone, just wondering why do we need to expand the tensor to get access to grad_fn as described here? Can we replace the expand_as with a view operation instead? Thanks in advance!