List of nn.Module in a nn.Module

NgPDat · January 27, 2017, 7:50am

Hello all,
I want to create a RNN-like module with fixed number of timestep. The weight of each timestep is untied (not shared).
To achieve that, I make a list and append seperate linear module into that list. But when I use function such as print() or .cuda() and on that module, they do not properly recognize modules in the list.
Example to illustrate my problem:

class testNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, step=1):
        super(testNet, self).__init__()
        self.linear = nn.Linear(100, 100) #dummy module
        self.linear_combines1 = []
        self.linear_combines2 = []
        for i in range(step):
            self.linear_combines1.append(nn.Linear(input_dim, hidden_dim))
            self.linear_combines2.append(nn.Linear(hidden_dim, hidden_dim))

net = testNet(128, 256, 3)
print(net) #Won't print what is in the list
net.cuda() #Won't send the module in the list to gpu

What is the intended correct way to do this?

fmassa · January 27, 2017, 8:29am

Hi,

We had a discussion about a similar thing yesterday at the pytorch slack channel, and allowing parameters/modules in lists is tricky.

But we can work around that using a dedicated Module (similar to Sequential), which will access the elements as if it was a ilst.
Here is an example

import torch
import torch.nn as nn

class ListModule(nn.Module):
    def __init__(self, *args):
        super(ListModule, self).__init__()
        idx = 0
        for module in args:
            self.add_module(str(idx), module)
            idx += 1

    def __getitem__(self, idx):
        if idx < 0 or idx >= len(self._modules):
            raise IndexError('index {} is out of range'.format(idx))
        it = iter(self._modules.values())
        for i in range(idx):
            next(it)
        return next(it)

    def __iter__(self):
        return iter(self._modules.values())

    def __len__(self):
        return len(self._modules)

class testNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, step=1):
        super(testNet, self).__init__()
        self.linear = nn.Linear(100, 100) #dummy module
        linear_combines1 = []
        linear_combines2 = []
        for i in range(step):
            linear_combines1.append(nn.Linear(input_dim, hidden_dim))
            linear_combines2.append(nn.Linear(hidden_dim, hidden_dim))
        self.linear_combines1 = ListModule(*linear_combines1)
        self.linear_combines2 = ListModule(*linear_combines2)

net = testNet(128, 256, 3)
print(net)
net.cuda()

print(net.linear_combines1[0])
print(len(net.linear_combines2))
for i in net.linear_combines1:
    print(i.weight.data.type())

Note that my example implementation does not implement a forward in ListModule, and you are supposed to index its elements to get the corresponding module.

apaszke · January 27, 2017, 9:51am

To complement @fmassa’s post, it fails because we only capture modules that are assigned directly to the Module object. It gets too tricky and bug prone otherwise. There are a number of tricks you can use to get around it, with ListModule shown above being one of them. If I were to suggest something, I’d keep all the modules in a single container like this:

class AttrProxy(object):
    """Translates index lookups into attribute lookups."""
    def __init__(self, module, prefix):
        self.module = module
        self.prefix = prefix

    def __getitem__(self, i):
        return getattr(self.module, self.prefix + str(i))


class testNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, steps=1):
        super(testNet, self).__init__()
        self.steps = steps
        for i in range(steps):
            self.add_module('i2h_' + str(i), nn.Linear(input_dim, hidden_dim))
            self.add_module('h2h_' + str(i), nn.Linear(hidden_dim, hidden_dim))
        self.i2h = AttrProxy(self, 'i2h_')
        self.h2h = AttrProxy(self, 'h2h_')

    def forward(self, input, hidden):
        # here, use self.i2h[t] and self.h2h[t] to index 
        # input2hidden and hidden2hidden modules for each step,
        # or loop over them, like in the example below
        # (assuming first dim of input is sequence length)
        for inp, i2h, h2h in zip(input, self.i2h, self.h2h):
            hidden = F.tanh(i2h(input) + h2h(hidden))
        return hidden

NgPDat · January 27, 2017, 4:37pm

Thank you! Both ideas are great. I took some time to incorporate two ideas together. And here is my take on it:

class ListModule(object):
    #Should work with all kind of module
    def __init__(self, module, prefix, *args):
        self.module = module
        self.prefix = prefix
        self.num_module = 0
        for new_module in args:
            self.append(new_module)

    def append(self, new_module):
        if not isinstance(new_module, nn.Module):
            raise ValueError('Not a Module')
        else:
            self.module.add_module(self.prefix + str(self.num_module), new_module)
            self.num_module += 1

    def __len__(self):
        return self.num_module

    def __getitem__(self, i):
        if i < 0 or i >= self.num_module:
            raise IndexError('Out of bound')
        return getattr(self.module, self.prefix + str(i))


class testNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, steps=1):
        super(testNet, self).__init__()
        self.steps = steps
        self.i2h = ListModule(self, 'i2h_')
        self.h2h = ListModule(self, 'h2h_')
        for i in range(steps):
            self.i2h.append(nn.Linear(input_dim, hidden_dim))
            self.h2h.append(nn.Linear(hidden_dim, hidden_dim))

    def forward(self, input, hidden):
        for inp, i2h, h2h in zip(input, self.i2h, self.h2h):
            hidden = F.tanh(i2h(inp) + h2h(hidden))
        return hidden

net = testNet(128, 256, 3)
print(net)
net.cuda()
inp = Variable(torch.randn(3, 4, 128)).cuda()
init = Variable(torch.randn(4, 256)).cuda()
out = net(inp, init)

It work similar to python list like @fmassa’s idea, but the module itself is kept in the caller’s container like @apaszke’s idea.
By the way, Is it possible to include this in the main Pytorch package?

Atcold · February 1, 2017, 9:27pm

Everything here is very informative. Thank you.
Looking forward to the inclusion into the main PyTorch package too!

NgPDat · February 2, 2017, 3:23am

Pytorch package now officially supports list of Module/Parameter, according to this commit [here] (https://github.com/pytorch/pytorch/commit/c7c8aaa7f040dd449dbc6aca9204b2f943aef477).

apaszke · February 2, 2017, 9:19pm

Yup, just wanted to wait until I write the docs before posting here. They should be up today or tomorrow.

Atcold · February 14, 2017, 3:18pm

@apaszke, I cannot find the documentation you are talking about.
Could you share a link, perhaps? Furthermore, shall I install from source to get this feature or Conda will suffice?

apaszke · February 14, 2017, 6:08pm

The docs aren’t merged yet. I’ve been working on bug fixes.

Brando_Miranda · August 11, 2017, 2:33am

in your example the elements of *args are already modules. How does one convert a (trainable) variable to a module? this failed to me:

    self.W = Variable(w_init, requires_grad=True)
    self.mod_list = torch.nn.ModuleList([self.W])

fmassa · August 11, 2017, 1:10pm

What do you want to do exactly?
A Variable is a variable, and doesn’t imply any computation, so making a Module from just a Variable doesn’t make sense to me. A Module carries information from how its different Variables are operated together to produce an output.

Brando_Miranda · August 11, 2017, 1:45pm

A linear “module” without a bias is consider a module when its simply just a Variable (matrix with some dimension). I don’t see the difference of that and my suggestion. Something like:

torch.nn.Linear(D_in,D_out,bias=False)

fmassa · August 11, 2017, 7:52pm

Note that a Linear module is more than just a Variable.
Check the difference:

def linear(input):
    x = Variable(torch.rand(2, 2))
    return input

and

def linear(input):
    x = Variable(torch.rand(2, 2))
    return torch.matmul(input, x)

BestCookies · September 11, 2020, 5:46am

I follow your method and it works well when I only use one GPU.
However, when I try to run it on Dataparallel, it gives me the error:

RuntimeError: Expected tensor for argument #1 ‘input’ to have the same device as tensor for argument #2 ‘weight’; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)

Any suggestion?
Thanks!