Torch.nn.parallel.replicate

srxzr · December 4, 2019, 7:47pm

Is it possible to change the replicate function such that it does not aggregate all of the gradients on the main model ?