Maximum recursion bug?

Martin_K · August 15, 2019, 10:34am

I was trying to build a tree like neural net, where there are nested modules in ModuleList/ModuleDict objects. However I encountered a maximum recursion bug in the top node, the ‘root’ of the tree. To make it simple, I created a minimal example to reproduce this error (Pythorch 1.2):


class TreeNode_Test(nn.Module):
    def __init__(self):
        super(TreeNode_Test, self).__init__()
        self.nodesInLevels = nn.ModuleList([self]) 

myModel = TreeNode_Test()
myModel # when calling this or myModel.nodesInLevels I ll get max recursion error:


  File "C:\Users\mk23\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1042, in __repr__
    mod_str = repr(module)

  File "C:\Users\mk23\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1042, in __repr__
    mod_str = repr(module)

  File "C:\Users\mk23\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1036, in __repr__
    extra_repr = self.extra_repr()

RecursionError: maximum recursion depth exceeded

Any ideas?

albanD · August 15, 2019, 10:39am

Hi,

You model references itself when you use self in your ModuleList. So the print function that tries to print all the module that are contained in your model will run infinitely.

Martin_K · August 15, 2019, 12:52pm

That is an explanation but not a solution

I don’t think this behaviour is correct, if I change the parent class to object instead of nn.Module, or the nn.ModuleList to python list(), then it will work as expected - but then it won’t work with DataParallel. As then it won’t replicate the model properly across multiple GPUs and I will end up with the dreaded tensors/parameters on different GPUs error…

It isn’t just print, (which I could avoid by not calling it), pretty much everything ends up in an infinite loop, eg module.apply(fn) which I can’t avoid using.

ptrblck · August 15, 2019, 3:04pm

What would your use case be?
If you use self as a module inside nn.Module, even the __call__ function will try to recursively call itself.

WilliamOnVoyage · February 19, 2021, 10:39pm

Hi, I have encountered the same issue when following this parallelism tutorial: Multi-GPU Examples — PyTorch Tutorials 1.7.1 documentation → Attributes of the wrapped module

Simple code snippet to reproduce:

>>> import torch
>>> class TorchDataParallel(torch.nn.DataParallel):
...     def __getattr__(self, name):
...         return getattr(self.module, name)
... 
>>> block = torch.nn.Module()
>>> parallel_block = TorchDataParallel(block)
>>> parallel_block.stream_names
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in __getattr__
  File "<stdin>", line 3, in __getattr__
  File "<stdin>", line 3, in __getattr__
  [Previous line repeated 995 more times]
RecursionError: maximum recursion depth exceeded

Wondering what would be the right way to access custom attributes?

ptrblck · February 20, 2021, 6:36am

Which PyTorch version are you using?
I’ve just tried it out on a ~1 week old source build and don’t get an error.

radugrosu · June 2, 2021, 4:29pm

I get the same on ‘1.7.1+cu101’.
Can get it to work doing:

class DataParallel(torch.nn.parallel.DataParallel):
    def __getattr__(self, name):
        module = object.__getattribute__(self, "_modules")["module"]
        if name == "module":
            return module
        return getattr(module, name)