Issues on using nn.DataParallel with Python 3.10 and PyTorch 1.11

I modify nn.DataParallel so as it handles a modified network (a GenForce GAN generator in my case), as follows:

class DataParallelPassthrough(nn.DataParallel):
    def __getattr__(self, name):
        try:
            return super(DataParallelPassthrough, self).__getattr__(name)
        except AttributeError:
            return getattr(self.module, name)

If I use standard nn.DataParallel, I get the following errors:

Traceback (most recent call last):
  [...]
  File "/home/***/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1185, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DataParallel' object has no attribute 'dim_z'

that is, errors about attributes that I have added to my network but standard nn.DataParallel isn’t “aware of”.

DataParallelPassthrough used to work just fine, but after upgrading to Python 3.10 (‘Python 3.10.2’) and PyTorch 1.11 (‘1.11.0+cu102’), I get the following error when I actually use data parallelism:

Traceback (most recent call last):
  [...]
  File "/home/***/lib/python3.10/site-packages/torch/cuda/nccl.py", line 51, in _check_sequence_type
    if not isinstance(inputs, collections.Container) or isinstance(inputs, torch.Tensor):
AttributeError: module 'collections' has no attribute 'Container'

If I don’t call DataParallelPassthrough (i.e., if the batch size allows using a single GPU), everything is fine.

Any ideas on how I could fix this? Any insight on a better way to parallelize my (modified) network?

Thank you!

Based on this comment it seems to be related to Python 3.10.
Could you downgrade your Python version and check if 3.9 would work (in a new virtual environment)?

@ptrblck Thanks for your comment, I was aware of it being Python3.10-related but I thought I should ask here in case there are any insights on how to solve this, or even whether there’s a “better” way to parallelize my model.

Indeed, with python 3.9 I had no problems (not tested with python 3.9 AND PyTorch 1.11 though).

I think the nightly binary could work with Python 3.10 as this PR seems to have fixed the issue.

Just to confirm, I just tried to install the nightly binary as:

pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html

and checked again, but it has not been fixed.

That’s strange, since the failing line of code in the current master is using:

def _check_sequence_type(inputs: Union[torch.Tensor, Sequence[torch.Tensor]]) -> None:
    if not isinstance(inputs, collections.abc.Container) or isinstance(inputs, torch.Tensor):
        raise TypeError("Inputs should be a collection of tensors")

now and doesn’t fit your error message anymore:

if not isinstance(inputs, collections.Container) or isinstance(inputs, torch.Tensor):

Indeed I get the same error

  File "/home/***/lib/python3.10/site-packages/torch/cuda/nccl.py", line 51, in _check_sequence_type
    if not isinstance(inputs, collections.Container) or isinstance(inputs, torch.Tensor):
AttributeError: module 'collections' has no attribute 'Container'

Apparently I have a different pytorch version before. I think you linked a forked (by carmocca) version before, but this seems to be the case in the master of the original repo as well.

Apologies if this is a stupid question, but shouldn’t I get this version (i.e., the master branch) by installing pytorch simply using an entry torch in the requirements.txt file in a venv? I think I’ve missed something trivial here…

Yes, you are right and I just followed the PR file change. The pytorch/master branch however also has this change as you’ve noticed.

No, I don’t think setting torch as a required package would install the nightly binary.

Sorry, my bad, but shouldn’t the nightly binary be installed as

pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html

I just checked again (after uninstalling torch and installing again using the above line) and the condition if still as isinstance(inputs, collections.Container).

I see the expected change in the current nightly, so I guess you are still looking into a wrong installation:

root@4031504dc1c7:/workspace# pip install --pre torch -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html
Looking in links: https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html
Collecting torch
  Downloading https://download.pytorch.org/whl/nightly/cu113/torch-1.12.0.dev20220317%2Bcu113-cp38-cp38-linux_x86_64.whl (1623.1 MB)
     |████████████████████████████████| 1623.1 MB 1.8 kB/s 
Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.8/site-packages (from torch) (3.10.0.0)
Installing collected packages: torch
Successfully installed torch-1.12.0.dev20220317+cu113
root@4031504dc1c7:/workspace# python -c "import torch; print(torch.__path__)"
['/opt/conda/lib/python3.8/site-packages/torch']
root@4031504dc1c7:/workspace# sed -n 51p /opt/conda/lib/python3.8/site-packages/torch/cuda/nccl.py 
    if not isinstance(inputs, collections.abc.Container) or isinstance(inputs, torch.Tensor):

@ptrblck even though I was using the same nightly binary, for some strange reason it refused to update.

Eventually, I uninstalled all torch* from the venv, reinstalled and now I’m seeing the correct line.

Thanks for your help and patience!

@nullgeppetto I think today many people having this problem because python 3.10 and torch 1.11 are lately update version!
Maybe lately version of torch/cuda with nccl doesn’t support (collections.Container) so we should use (collections.abc.Container) instead of (collections.Container)
There is no need to change the version of Pytorch!
I’m glad it worked out ^^

3 Likes

Hi @mingun0112, indeed I observed the same thing today – changed to collections.abc.Container as a dirty but quick workaround.

1 Like

Thanks for providing a solution :slight_smile: But could you please also tell how and where it has to be changed?

Hi @jure, apologies for the late reply.

So, you need to change lib/python3.10/site-packages/torch/cuda/nccl.py, at line 51:

def _check_sequence_type(inputs: Union[torch.Tensor, Sequence[torch.Tensor]]) -> None:
    if not isinstance(inputs, collections.abc.Container) or isinstance(inputs, torch.Tensor):
        raise TypeError("Inputs should be a collection of tensors")
1 Like

Thank you so much :slight_smile: