AttributeError: 'ModelParallelModel' object has no attribute '_forward_pre_hooks'

The chunks input is a list of nn.Sequential networks from a model I have divided up to run on multiple GPUs/CPUs. The device_list input is a list of devices, like for example ['cuda:0', 'cuda:1', 'cuda:2', 'cuda:3'], or ['cpu', 'cuda:0', 'cuda:1']. Both lists will have the same number of values.

Unfortunately I don’t have multiple GPUs, and online services can be rather expensive for multiple GPUs. So, I want to make sure that I have things right before I try to run the code. I created the following class to run a single model with a batch size of 1 across multiple devices. Basically each device is supposed to run part of a model before passing the output to the next device.

I put the chunks onto their devices:

    for i, chunk in enumerate(chunks):[i])

And then I pass them to my class:

class ModelParallelModel(nn.Sequential):
    def __init__(self, chunks, device_list):
        super(ModelParallelModel, self)
        self.chunks = chunks
        self.device_list = device_list

    def forward(self, input):
        for i, chunk in enumerate(chunks):
            if i < len(chunks) -1:
               input = chunk([i]) ).to(device_list[i+1])
               input = chunk([i]))
        return input

These lines of code are where I create the model and then try to run:

 net = ModelParallelModel(chunks, device_list)

Though I get this error:

['cuda:0', 'cuda:1', 'cuda:2', 'cuda:3']
Traceback (most recent call last):
  File "", line 465, in <module>
  File "", line 164, in main
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/", line 538, in __call__
    for hook in self._forward_pre_hooks.values():
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/", line 591, in __getattr__
    type(self).__name__, name))
AttributeError: 'ModelParallelModel' object has no attribute '_forward_pre_hooks'

How do I fix this error? I can’t seem to find anything about what this error means, or how to fix it anywhere.

I was trying to follow this guide:

You are missing the __init__ call in super:

# yours
super(ModelParallelModel, self)
# fix to
super(ModelParallelModel, self).__init__()

However, I’m currently not sure, why you are deriving from nn.Sequential instead of nn.Module, as you are not using any attributes from nn.Sequential.


@ptrblck Thanks for the reply, I was able to get my code running on multiple GPUs!

I pull loss values from multiple layers (that can be on different devices) and add them together before running backward(). Currently, I have to use .to(device) to bring all these loss values to a single device to run backwards. It looks a bit ugly and it’s redundant when I’m using a single GPU/CPU, but I’m not sure if that’s something I could change? DataParallel is for duplicating a model across different GPUs and CPUs, but could I use it only on the loss values so that I don’t have to use to use .to(device)? If I can use it, then would it slow down my code?

I’m also having an issue where GPU:0 always shows at least some usage. In the few tests I did with the same inputs and changing what layers went on which GPUs, I also had at least 850MiB of usage on GPU:0, even when it shouldn’t have been used at all.

Currently I make sure that my inputs and model are CUDA with the following code:

dtype = torch.cuda.FloatTensor

model = model.cuda()

Do I need to do something else in order to prevent GPU:0 from always having extra usage?