Export DataParallel to ONNX

Hi! I am new to Pytorch and distributed learning. I am currently training a RL model using 4 GPUs on the server for an embodied navigation task. I process the visual information using pretrained MobileNet v2 model from Torchvision.

I wrap the visual network in using DataParallel, and the visual network is included in the final actor-critic model. However, when I export the final model to onnx, it gave an error a NoneType error when reading.

I read from other posts and understand that if I wrap the final model into DataParallel, I could call the module attribute. However, I don’t know what exactly I should do when part of the model is wrapped in DataParallel and I couldn’t find similar issues online.

Hey @Kevin57, could you please create an issue on PyTorch github with the error log? And please also add an onnx label to that issue. Thanks!

I read from other posts and understand that if I wrap the final model into DataParallel, I could call the module attribute. However, I don’t know what exactly I should do when part of the model is wrapped in DataParallel and I couldn’t find similar issues online.

Could you please show a code snippet? Is it sth like:

class MyModule(nn.Module):
  def __init__(self):
    self.l1 = nn.Linear(20, 20)
    self.l2 = DataParallel(nn.Linear(20, 20))

If this is the case, does the following work?

net = MyModule()
net.l2.module # this should returns the non-DataParallel linear module 

cc @VitalyFedyunin

Hi! Thank you for your response! I have submitted an issue on PyTorch but I am not able to add a label. The name is DataParallel to ONNX.

For the code snippet, I think this is a similar example to mine! I tried to use something like net.ls.module in your code snippet, but it does not work and it gives me the following error message:

File “/home/qi/mlagents_gesthor/ml-agents/mlagents/trainers/torch/model_serialization.py”, line 119, in export_policy_model
self.policy.actor_critic.module,
File “/home/qi/gesthor/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 778, in getattr
raise ModuleAttributeError("’{}’ object has no attribute ‘{}’".format(
torch.nn.modules.module.ModuleAttributeError: ‘SeparateActorCritic’ object has no attribute ‘module’

where SeparateActorCritic is similar to net in your code snippet.