Torch.nn.parallel.replicate

Is it possible to change the replicate function such that it does not aggregate all of the gradients on the main model ?

1 Like