Hi All, I am running a machine learning experiment in which I need to implement parallel processing across multiple GPUs in order to speed up the forward function of the neural network. I understand the the nn.DistributedDataParallel
class can only be used as a wrapper outside of a neural network module. I am wondering if there is an approach that provides the same functionality on the inside of a forward function, as what I am trying to do is to get duplicates of only one of the neural network layers to run simultaneously on multiple GPUs with simultaneous output.