List of nn.Module in a nn.Module

Hello all,
I want to create a RNN-like module with fixed number of timestep. The weight of each timestep is untied (not shared).
To achieve that, I make a list and append seperate linear module into that list. But when I use function such as print() or .cuda() and on that module, they do not properly recognize modules in the list.
Example to illustrate my problem:

class testNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, step=1):
        super(testNet, self).__init__()
        self.linear = nn.Linear(100, 100) #dummy module
        self.linear_combines1 = []
        self.linear_combines2 = []
        for i in range(step):
            self.linear_combines1.append(nn.Linear(input_dim, hidden_dim))
            self.linear_combines2.append(nn.Linear(hidden_dim, hidden_dim))

net = testNet(128, 256, 3)
print(net) #Won't print what is in the list
net.cuda() #Won't send the module in the list to gpu

What is the intended correct way to do this?

Hi,

We had a discussion about a similar thing yesterday at the pytorch slack channel, and allowing parameters/modules in lists is tricky.

But we can work around that using a dedicated Module (similar to Sequential), which will access the elements as if it was a ilst.
Here is an example

import torch
import torch.nn as nn

class ListModule(nn.Module):
    def __init__(self, *args):
        super(ListModule, self).__init__()
        idx = 0
        for module in args:
            self.add_module(str(idx), module)
            idx += 1

    def __getitem__(self, idx):
        if idx < 0 or idx >= len(self._modules):
            raise IndexError('index {} is out of range'.format(idx))
        it = iter(self._modules.values())
        for i in range(idx):
            next(it)
        return next(it)

    def __iter__(self):
        return iter(self._modules.values())

    def __len__(self):
        return len(self._modules)

class testNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, step=1):
        super(testNet, self).__init__()
        self.linear = nn.Linear(100, 100) #dummy module
        linear_combines1 = []
        linear_combines2 = []
        for i in range(step):
            linear_combines1.append(nn.Linear(input_dim, hidden_dim))
            linear_combines2.append(nn.Linear(hidden_dim, hidden_dim))
        self.linear_combines1 = ListModule(*linear_combines1)
        self.linear_combines2 = ListModule(*linear_combines2)

net = testNet(128, 256, 3)
print(net)
net.cuda()

print(net.linear_combines1[0])
print(len(net.linear_combines2))
for i in net.linear_combines1:
    print(i.weight.data.type())

Note that my example implementation does not implement a forward in ListModule, and you are supposed to index its elements to get the corresponding module.

To complement @fmassa’s post, it fails because we only capture modules that are assigned directly to the Module object. It gets too tricky and bug prone otherwise. There are a number of tricks you can use to get around it, with ListModule shown above being one of them. If I were to suggest something, I’d keep all the modules in a single container like this:

class AttrProxy(object):
    """Translates index lookups into attribute lookups."""
    def __init__(self, module, prefix):
        self.module = module
        self.prefix = prefix

    def __getitem__(self, i):
        return getattr(self.module, self.prefix + str(i))


class testNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, steps=1):
        super(testNet, self).__init__()
        self.steps = steps
        for i in range(steps):
            self.add_module('i2h_' + str(i), nn.Linear(input_dim, hidden_dim))
            self.add_module('h2h_' + str(i), nn.Linear(hidden_dim, hidden_dim))
        self.i2h = AttrProxy(self, 'i2h_')
        self.h2h = AttrProxy(self, 'h2h_')

    def forward(self, input, hidden):
        # here, use self.i2h[t] and self.h2h[t] to index 
        # input2hidden and hidden2hidden modules for each step,
        # or loop over them, like in the example below
        # (assuming first dim of input is sequence length)
        for inp, i2h, h2h in zip(input, self.i2h, self.h2h):
            hidden = F.tanh(i2h(input) + h2h(hidden))
        return hidden

Thank you! Both ideas are great. I took some time to incorporate two ideas together. And here is my take on it:

class ListModule(object):
    #Should work with all kind of module
    def __init__(self, module, prefix, *args):
        self.module = module
        self.prefix = prefix
        self.num_module = 0
        for new_module in args:
            self.append(new_module)

    def append(self, new_module):
        if not isinstance(new_module, nn.Module):
            raise ValueError('Not a Module')
        else:
            self.module.add_module(self.prefix + str(self.num_module), new_module)
            self.num_module += 1

    def __len__(self):
        return self.num_module

    def __getitem__(self, i):
        if i < 0 or i >= self.num_module:
            raise IndexError('Out of bound')
        return getattr(self.module, self.prefix + str(i))


class testNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, steps=1):
        super(testNet, self).__init__()
        self.steps = steps
        self.i2h = ListModule(self, 'i2h_')
        self.h2h = ListModule(self, 'h2h_')
        for i in range(steps):
            self.i2h.append(nn.Linear(input_dim, hidden_dim))
            self.h2h.append(nn.Linear(hidden_dim, hidden_dim))

    def forward(self, input, hidden):
        for inp, i2h, h2h in zip(input, self.i2h, self.h2h):
            hidden = F.tanh(i2h(inp) + h2h(hidden))
        return hidden

net = testNet(128, 256, 3)
print(net)
net.cuda()
inp = Variable(torch.randn(3, 4, 128)).cuda()
init = Variable(torch.randn(4, 256)).cuda()
out = net(inp, init)

It work similar to python list like @fmassa’s idea, but the module itself is kept in the caller’s container like @apaszke’s idea.
By the way, Is it possible to include this in the main Pytorch package?

Everything here is very informative. Thank you.
Looking forward to the inclusion into the main PyTorch package too!

Pytorch package now officially supports list of Module/Parameter, according to this commit [here] (https://github.com/pytorch/pytorch/commit/c7c8aaa7f040dd449dbc6aca9204b2f943aef477).

Yup, just wanted to wait until I write the docs before posting here. They should be up today or tomorrow.

@apaszke, I cannot find the documentation you are talking about.
Could you share a link, perhaps? Furthermore, shall I install from source to get this feature or Conda will suffice?

The docs aren’t merged yet. I’ve been working on bug fixes.

in your example the elements of *args are already modules. How does one convert a (trainable) variable to a module? this failed to me:

    self.W = Variable(w_init, requires_grad=True)
    self.mod_list = torch.nn.ModuleList([self.W])

What do you want to do exactly?
A Variable is a variable, and doesn’t imply any computation, so making a Module from just a Variable doesn’t make sense to me. A Module carries information from how its different Variables are operated together to produce an output.

A linear “module” without a bias is consider a module when its simply just a Variable (matrix with some dimension). I don’t see the difference of that and my suggestion. Something like:

torch.nn.Linear(D_in,D_out,bias=False)

Note that a Linear module is more than just a Variable.
Check the difference:

def linear(input):
    x = Variable(torch.rand(2, 2))
    return input

and

def linear(input):
    x = Variable(torch.rand(2, 2))
    return torch.matmul(input, x)

I follow your method and it works well when I only use one GPU.
However, when I try to run it on Dataparallel, it gives me the error:

RuntimeError: Expected tensor for argument #1 ‘input’ to have the same device as tensor for argument #2 ‘weight’; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)

Any suggestion?
Thanks!