I am working with torch.nn.parallel.DistributedDataParallel
class and I saw that it assigns the input module
under self.module
.
From the source code at line 599
in torch/nn/parallel/distributed.py
:
self.module = module
so for an instance of DDP, let’s say:
ddp = parallel.DistributedDataParallel(module=mod)
When I try to access ddp.__dict__['module']
the key is not found. However I can access the attribute by simply doing a ddp.module
. When I print ddp.__dict__.keys()
I get the following:
dict_keys(['training', '_parameters', '_buffers', '_non_persistent_buffers_set', '_backward_hooks',
'_is_full_backward_hook', '_forward_hooks', '_forward_pre_hooks', '_state_dict_hooks',
'_load_state_dict_pre_hooks', '_modules', 'is_multi_device_module', 'device_type', 'device_ids',
'output_device', 'process_group', 'static_graph', 'dim', 'device', 'broadcast_buffers',
'find_unused_parameters', 'require_backward_grad_sync', 'require_forward_param_sync',
'ddp_uneven_inputs_config', 'gradient_as_bucket_view', 'parameters_to_ignore',
'broadcast_bucket_size', 'bucket_bytes_cap', 'use_side_stream_for_tensor_copies',
'_module_copies', 'modules_params', 'modules_buffers', 'num_iterations', 'reducer', 'logger'])
I couldn’t find in the source code of how this masking was achieved and if there is a reason to do that. But I would like to know.