Hi,
I’m trying to use DataParallel on my Module, but one of the attributes of the module (nufft_plan
) isn’t a torch object but it still dependent on the module’s device:
class MyModule(torch.nn.Module):
def __init__(self,data,params):
super(MyModule,self).__init__()
self.nufft_plan = NufftPlan(params,data.device)
self.data= torch.nn.Parameter(data)
self.params = params
def forward(self,input):
output = some_computation(self.data,self.nufft_plan,input)
return output
def to(self,device):
module = super().to(device)
module.nufft_plan = NufftPlan(self.params,device)
return module
When I use DataParallel
on MyModule
it doesn’t invoke the .to
method, so I can’t move nufft_plan
into each GPU device. Is there any other way I can move nufft_plan
into the right device when using DataParallel
?
You can move it to the right device inside the forward
method using the .device
attribute of any other properly registered parameter or the input.
Hi,
This is what I will probably end up doing, but I’d rather avoid moving it in forward, since this object doesn’t actually allow me to move it to a different device and I just overwrite it with a new object on the desired device. It also doesn’t allow me to check what it’s current device.
If I do the moving in forward I need to check if the object is on the right device or not (so
I must save its current device) and move it accordingly.
Is there a more “elegant” way of doing it? something like using a custom function when wrapping the module with DataParallel?
Thank you very much 
The more elegant and faster approach would be to use DistributedDataParallel
which avoids model copies performed in nn.DataParallel
and allows you to use a single process per device. Since each process moves its model to the corresponding device only, the repeated calls in the forward
wouldn’t be needed anymore.
I see, I did use DistributedDataParallel
(on something else) but the computation I’m trying to do here could be relatively short and then the time it takes to start the processes can actually slow it down. I guess I could use both DataParallel/DistrubtedDataParallel depending on the size of the dataset (I’m not trying to train some model but just perform some computation over the dataset).
Thank you very much for your help 
Hi,
Model Parallelism(TensorParallelism or PipelineParallelism) not DataParallel or DataDistributedParallel.
I am a master student and I am also trying to do the related task which is performing the Distributed Inference on Edge devices but I am facing some challenges while doing the Pytorch RPC implementation to connect to multiple devices using their IP addresses and port. For now I have 2 Nvidia Jetson Nanos and their IP addresses and I am not sure what is the correct way of serving the NN layers on both the edge devices simultaneously.
Any help would be great. Thank you in advance.