Hi! I am new to Pytorch and distributed learning. I am currently training a RL model using 4 GPUs on the server for an embodied navigation task. I process the visual information using pretrained MobileNet v2 model from Torchvision.
I wrap the visual network in using DataParallel, and the visual network is included in the final actor-critic model. However, when I export the final model to onnx, it gave an error a NoneType error when reading.
I read from other posts and understand that if I wrap the final model into DataParallel, I could call the module attribute. However, I don’t know what exactly I should do when part of the model is wrapped in DataParallel and I couldn’t find similar issues online.
Hey @Kevin57, could you please create an issue on PyTorch github with the error log? And please also add an onnx label to that issue. Thanks!
I read from other posts and understand that if I wrap the final model into DataParallel, I could call the module attribute. However, I don’t know what exactly I should do when part of the model is wrapped in DataParallel and I couldn’t find similar issues online.
Could you please show a code snippet? Is it sth like:
Hi! Thank you for your response! I have submitted an issue on PyTorch but I am not able to add a label. The name is DataParallel to ONNX.
For the code snippet, I think this is a similar example to mine! I tried to use something like net.ls.module in your code snippet, but it does not work and it gives me the following error message:
File “/home/qi/mlagents_gesthor/ml-agents/mlagents/trainers/torch/model_serialization.py”, line 119, in export_policy_model
self.policy.actor_critic.module,
File “/home/qi/gesthor/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 778, in getattr
raise ModuleAttributeError(“‘{}’ object has no attribute ‘{}’”.format(
torch.nn.modules.module.ModuleAttributeError: ‘SeparateActorCritic’ object has no attribute ‘module’
where SeparateActorCritic is similar to net in your code snippet.