Making a wrapper around nn.DataParallel to access module attributes is safe?

hilman-dayo · April 30, 2020, 2:41pm

Hello everyone.
I am using pytorch nn.DataParallel, but I cannot access any variables within may model with it (I will get AttributeError).

So, I make a wrapper like this:

class DataParallelWrapper(nn.DataParallel):
    def __getattr__(self, name):
        try:
            return super().__getattr__(name)
        except AttributeError:
            return getattr(self.module, name)

Being completely zero in how nn.DataParallel is being implemented, it raises a question.
Is this a safe approach?

If it matters, I am using PyTorch 1.1 (need to update code before upgrading to new shiny version).

mrshenli · April 30, 2020, 3:20pm

DataParallel does have a self.module variable:

github.com

pytorch/pytorch/blob/df8d6eeb19423848b20cd727bc4a728337b73829/torch/nn/parallel/data_parallel.py#L131


        self.module = module
        self.device_ids = []
        return


    if device_ids is None:
        device_ids = list(range(torch.cuda.device_count()))
    if output_device is None:
        output_device = device_ids[0]


    self.dim = dim
    self.module = module
    self.device_ids = list(map(lambda x: _get_device_index(x, True), device_ids))
    self.output_device = _get_device_index(output_device, True)
    self.src_device_obj = torch.device("cuda:{}".format(self.device_ids[0]))


    _check_balance(self.device_ids)


    if len(self.device_ids) == 1:
        self.module.cuda(device_ids[0])


def forward(self, *inputs, **kwargs):

The following code works for me:

import torch.nn as nn
import torch
from torch.nn.parallel import DataParallel

class DataParallelWrapper(DataParallel):
    def __init__(self, module):
        super().__init__(module)


    def __getattr__(self, name):
        try:
            return super().__getattr__(name)
        except AttributeError:
            return getattr(self.module, name)


dpw = DataParallelWrapper(nn.Linear(2, 2))
print(getattr(dpw, "forward"))
print(getattr(dpw, "weight"))

hilman-dayo · April 30, 2020, 3:47pm

I am fully aware of the ability of accessing my network through nn.DataParallel.module. However, I need to make sure that my code is executable (no AttributeError when I am trying to access my defined-variable in my network) whether the user is using single GPU or multiple GPUs. Hence, the DataPrallelWrapper.

The only problem is I do not know whether this is a good approach or not (will my approach invite any data race or async problem?).

mrshenli · April 30, 2020, 3:58pm

This should be fine I think. The main logic of DataParallel is implemented in the forward() function, it replicates the given module to available devices, scatters input, launches one thread per device to process the scattered input, and then joins threads and gathers the outputs. So as long as you are not trying to modify module parameters or gradients concurrently when executing DataParallel.forward(), I don’t see an issue here.