List of nn.Module in a nn.Module


(Ng P Dat) #1

Hello all,
I want to create a RNN-like module with fixed number of timestep. The weight of each timestep is untied (not shared).
To achieve that, I make a list and append seperate linear module into that list. But when I use function such as print() or .cuda() and on that module, they do not properly recognize modules in the list.
Example to illustrate my problem:

class testNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, step=1):
        super(testNet, self).__init__()
        self.linear = nn.Linear(100, 100) #dummy module
        self.linear_combines1 = []
        self.linear_combines2 = []
        for i in range(step):
            self.linear_combines1.append(nn.Linear(input_dim, hidden_dim))
            self.linear_combines2.append(nn.Linear(hidden_dim, hidden_dim))

net = testNet(128, 256, 3)
print(net) #Won't print what is in the list
net.cuda() #Won't send the module in the list to gpu

What is the intended correct way to do this?


Parameters not automatically registered in module
Model produces random output when using loops to define hidden layers
(Francisco Massa) #2

Hi,

We had a discussion about a similar thing yesterday at the pytorch slack channel, and allowing parameters/modules in lists is tricky.

But we can work around that using a dedicated Module (similar to Sequential), which will access the elements as if it was a ilst.
Here is an example

import torch
import torch.nn as nn

class ListModule(nn.Module):
    def __init__(self, *args):
        super(ListModule, self).__init__()
        idx = 0
        for module in args:
            self.add_module(str(idx), module)
            idx += 1

    def __getitem__(self, idx):
        if idx < 0 or idx >= len(self._modules):
            raise IndexError('index {} is out of range'.format(idx))
        it = iter(self._modules.values())
        for i in range(idx):
            next(it)
        return next(it)

    def __iter__(self):
        return iter(self._modules.values())

    def __len__(self):
        return len(self._modules)

class testNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, step=1):
        super(testNet, self).__init__()
        self.linear = nn.Linear(100, 100) #dummy module
        linear_combines1 = []
        linear_combines2 = []
        for i in range(step):
            linear_combines1.append(nn.Linear(input_dim, hidden_dim))
            linear_combines2.append(nn.Linear(hidden_dim, hidden_dim))
        self.linear_combines1 = ListModule(*linear_combines1)
        self.linear_combines2 = ListModule(*linear_combines2)

net = testNet(128, 256, 3)
print(net)
net.cuda()

print(net.linear_combines1[0])
print(len(net.linear_combines2))
for i in net.linear_combines1:
    print(i.weight.data.type())

Note that my example implementation does not implement a forward in ListModule, and you are supposed to index its elements to get the corresponding module.


(Adam Paszke) #3

To complement @fmassa’s post, it fails because we only capture modules that are assigned directly to the Module object. It gets too tricky and bug prone otherwise. There are a number of tricks you can use to get around it, with ListModule shown above being one of them. If I were to suggest something, I’d keep all the modules in a single container like this:

class AttrProxy(object):
    """Translates index lookups into attribute lookups."""
    def __init__(self, module, prefix):
        self.module = module
        self.prefix = prefix

    def __getitem__(self, i):
        return getattr(self.module, self.prefix + str(i))


class testNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, steps=1):
        super(testNet, self).__init__()
        self.steps = steps
        for i in range(steps):
            self.add_module('i2h_' + str(i), nn.Linear(input_dim, hidden_dim))
            self.add_module('h2h_' + str(i), nn.Linear(hidden_dim, hidden_dim))
        self.i2h = AttrProxy(self, 'i2h_')
        self.h2h = AttrProxy(self, 'h2h_')

    def forward(self, input, hidden):
        # here, use self.i2h[t] and self.h2h[t] to index 
        # input2hidden and hidden2hidden modules for each step,
        # or loop over them, like in the example below
        # (assuming first dim of input is sequence length)
        for inp, i2h, h2h in zip(input, self.i2h, self.h2h):
            hidden = F.tanh(i2h(input) + h2h(hidden))
        return hidden

Build model from repeated template
(Ng P Dat) #4

Thank you! Both ideas are great. I took some time to incorporate two ideas together. And here is my take on it:

class ListModule(object):
    #Should work with all kind of module
    def __init__(self, module, prefix, *args):
        self.module = module
        self.prefix = prefix
        self.num_module = 0
        for new_module in args:
            self.append(new_module)

    def append(self, new_module):
        if not isinstance(new_module, nn.Module):
            raise ValueError('Not a Module')
        else:
            self.module.add_module(self.prefix + str(self.num_module), new_module)
            self.num_module += 1

    def __len__(self):
        return self.num_module

    def __getitem__(self, i):
        if i < 0 or i >= self.num_module:
            raise IndexError('Out of bound')
        return getattr(self.module, self.prefix + str(i))


class testNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, steps=1):
        super(testNet, self).__init__()
        self.steps = steps
        self.i2h = ListModule(self, 'i2h_')
        self.h2h = ListModule(self, 'h2h_')
        for i in range(steps):
            self.i2h.append(nn.Linear(input_dim, hidden_dim))
            self.h2h.append(nn.Linear(hidden_dim, hidden_dim))

    def forward(self, input, hidden):
        for inp, i2h, h2h in zip(input, self.i2h, self.h2h):
            hidden = F.tanh(i2h(inp) + h2h(hidden))
        return hidden

net = testNet(128, 256, 3)
print(net)
net.cuda()
inp = Variable(torch.randn(3, 4, 128)).cuda()
init = Variable(torch.randn(4, 256)).cuda()
out = net(inp, init)

It work similar to python list like @fmassa’s idea, but the module itself is kept in the caller’s container like @apaszke’s idea.
By the way, Is it possible to include this in the main Pytorch package?


(Alfredo Canziani) #5

Everything here is very informative. Thank you.
Looking forward to the inclusion into the main PyTorch package too!


(Ng P Dat) #6

Pytorch package now officially supports list of Module/Parameter, according to this commit [here] (https://github.com/pytorch/pytorch/commit/c7c8aaa7f040dd449dbc6aca9204b2f943aef477).


(Adam Paszke) #7

Yup, just wanted to wait until I write the docs before posting here. They should be up today or tomorrow.


(Alfredo Canziani) #8

@apaszke, I cannot find the documentation you are talking about.
Could you share a link, perhaps? Furthermore, shall I install from source to get this feature or Conda will suffice?


(Adam Paszke) #9

The docs aren’t merged yet. I’ve been working on bug fixes.


(Brando Miranda) #10

in your example the elements of *args are already modules. How does one convert a (trainable) variable to a module? this failed to me:

    self.W = Variable(w_init, requires_grad=True)
    self.mod_list = torch.nn.ModuleList([self.W])

(Francisco Massa) #11

What do you want to do exactly?
A Variable is a variable, and doesn’t imply any computation, so making a Module from just a Variable doesn’t make sense to me. A Module carries information from how its different Variables are operated together to produce an output.


(Brando Miranda) #12

A linear “module” without a bias is consider a module when its simply just a Variable (matrix with some dimension). I don’t see the difference of that and my suggestion. Something like:

torch.nn.Linear(D_in,D_out,bias=False)

(Francisco Massa) #13

Note that a Linear module is more than just a Variable.
Check the difference:

def linear(input):
    x = Variable(torch.rand(2, 2))
    return input

and

def linear(input):
    x = Variable(torch.rand(2, 2))
    return torch.matmul(input, x)