Distributed Data Parallel .module attribute

kumar.s · January 26, 2022, 12:28pm

I am working with torch.nn.parallel.DistributedDataParallel class and I saw that it assigns the input module under self.module.
From the source code at line 599 in torch/nn/parallel/distributed.py:

self.module = module

so for an instance of DDP, let’s say:

ddp = parallel.DistributedDataParallel(module=mod)

When I try to access ddp.__dict__['module'] the key is not found. However I can access the attribute by simply doing a ddp.module. When I print ddp.__dict__.keys() I get the following:

dict_keys(['training', '_parameters', '_buffers', '_non_persistent_buffers_set', '_backward_hooks', 
'_is_full_backward_hook', '_forward_hooks', '_forward_pre_hooks', '_state_dict_hooks', 
'_load_state_dict_pre_hooks', '_modules', 'is_multi_device_module', 'device_type', 'device_ids', 
'output_device', 'process_group', 'static_graph', 'dim', 'device', 'broadcast_buffers', 
'find_unused_parameters', 'require_backward_grad_sync', 'require_forward_param_sync', 
'ddp_uneven_inputs_config', 'gradient_as_bucket_view', 'parameters_to_ignore', 
'broadcast_bucket_size', 'bucket_bytes_cap', 'use_side_stream_for_tensor_copies', 
 '_module_copies', 'modules_params', 'modules_buffers', 'num_iterations', 'reducer', 'logger'])

I couldn’t find in the source code of how this masking was achieved and if there is a reason to do that. But I would like to know.

H-Huang · January 28, 2022, 6:31pm

I believe this is a nuance with Python namespaces, not exactly distributed related. Calling a method/class on an instance such as instance.some_func(arg) means you are calling class.some_func(instance, arg). Since __dict__ is used to access instance attributes and module is a class then it will not show up. You can use dir() instead which accesses complete list of attributes. For your example you can try running print("module" in dir(ddp)) which will print out True.

kumar.s · January 31, 2022, 9:38am

I figured it out. The answer is in the __setattr__ of the class Module (from which DDP inherits). So the Module class __setattr__ removes everything from __dict__ which is of type Parameter or Module and stores them in the self._parameters and self._modules OrderedDict.

Here’s the corresponding snippet from the source code:

    def __setattr__(self, name: str, value: Union[Tensor, 'Module']) -> None:
        def remove_from(*dicts_or_sets):
            for d in dicts_or_sets:
                if name in d:
                    if isinstance(d, dict):
                        del d[name]
                    else:
                        d.discard(name)

and it is called for attributes of type Module later in the same function like so:

else:
        modules = self.__dict__.get('_modules')
        if isinstance(value, Module):
            if modules is None:
                raise AttributeError(
                    "cannot assign module before Module.__init__() call")
            remove_from(self.__dict__, self._parameters, self._buffers, self._non_persistent_buffers_set)
            modules[name] = value