Hello all,
I want to create a RNN-like module with fixed number of timestep. The weight of each timestep is untied (not shared).
To achieve that, I make a list and append seperate linear module into that list. But when I use function such as print() or .cuda() and on that module, they do not properly recognize modules in the list.
Example to illustrate my problem:
class testNet(nn.Module):
def __init__(self, input_dim, hidden_dim, step=1):
super(testNet, self).__init__()
self.linear = nn.Linear(100, 100) #dummy module
self.linear_combines1 = []
self.linear_combines2 = []
for i in range(step):
self.linear_combines1.append(nn.Linear(input_dim, hidden_dim))
self.linear_combines2.append(nn.Linear(hidden_dim, hidden_dim))
net = testNet(128, 256, 3)
print(net) #Won't print what is in the list
net.cuda() #Won't send the module in the list to gpu
But we can work around that using a dedicated Module (similar to Sequential), which will access the elements as if it was a ilst.
Here is an example
import torch
import torch.nn as nn
class ListModule(nn.Module):
def __init__(self, *args):
super(ListModule, self).__init__()
idx = 0
for module in args:
self.add_module(str(idx), module)
idx += 1
def __getitem__(self, idx):
if idx < 0 or idx >= len(self._modules):
raise IndexError('index {} is out of range'.format(idx))
it = iter(self._modules.values())
for i in range(idx):
next(it)
return next(it)
def __iter__(self):
return iter(self._modules.values())
def __len__(self):
return len(self._modules)
class testNet(nn.Module):
def __init__(self, input_dim, hidden_dim, step=1):
super(testNet, self).__init__()
self.linear = nn.Linear(100, 100) #dummy module
linear_combines1 = []
linear_combines2 = []
for i in range(step):
linear_combines1.append(nn.Linear(input_dim, hidden_dim))
linear_combines2.append(nn.Linear(hidden_dim, hidden_dim))
self.linear_combines1 = ListModule(*linear_combines1)
self.linear_combines2 = ListModule(*linear_combines2)
net = testNet(128, 256, 3)
print(net)
net.cuda()
print(net.linear_combines1[0])
print(len(net.linear_combines2))
for i in net.linear_combines1:
print(i.weight.data.type())
Note that my example implementation does not implement a forward in ListModule, and you are supposed to index its elements to get the corresponding module.
To complement @fmassaâs post, it fails because we only capture modules that are assigned directly to the Module object. It gets too tricky and bug prone otherwise. There are a number of tricks you can use to get around it, with ListModule shown above being one of them. If I were to suggest something, Iâd keep all the modules in a single container like this:
class AttrProxy(object):
"""Translates index lookups into attribute lookups."""
def __init__(self, module, prefix):
self.module = module
self.prefix = prefix
def __getitem__(self, i):
return getattr(self.module, self.prefix + str(i))
class testNet(nn.Module):
def __init__(self, input_dim, hidden_dim, steps=1):
super(testNet, self).__init__()
self.steps = steps
for i in range(steps):
self.add_module('i2h_' + str(i), nn.Linear(input_dim, hidden_dim))
self.add_module('h2h_' + str(i), nn.Linear(hidden_dim, hidden_dim))
self.i2h = AttrProxy(self, 'i2h_')
self.h2h = AttrProxy(self, 'h2h_')
def forward(self, input, hidden):
# here, use self.i2h[t] and self.h2h[t] to index
# input2hidden and hidden2hidden modules for each step,
# or loop over them, like in the example below
# (assuming first dim of input is sequence length)
for inp, i2h, h2h in zip(input, self.i2h, self.h2h):
hidden = F.tanh(i2h(input) + h2h(hidden))
return hidden
Thank you! Both ideas are great. I took some time to incorporate two ideas together. And here is my take on it:
class ListModule(object):
#Should work with all kind of module
def __init__(self, module, prefix, *args):
self.module = module
self.prefix = prefix
self.num_module = 0
for new_module in args:
self.append(new_module)
def append(self, new_module):
if not isinstance(new_module, nn.Module):
raise ValueError('Not a Module')
else:
self.module.add_module(self.prefix + str(self.num_module), new_module)
self.num_module += 1
def __len__(self):
return self.num_module
def __getitem__(self, i):
if i < 0 or i >= self.num_module:
raise IndexError('Out of bound')
return getattr(self.module, self.prefix + str(i))
class testNet(nn.Module):
def __init__(self, input_dim, hidden_dim, steps=1):
super(testNet, self).__init__()
self.steps = steps
self.i2h = ListModule(self, 'i2h_')
self.h2h = ListModule(self, 'h2h_')
for i in range(steps):
self.i2h.append(nn.Linear(input_dim, hidden_dim))
self.h2h.append(nn.Linear(hidden_dim, hidden_dim))
def forward(self, input, hidden):
for inp, i2h, h2h in zip(input, self.i2h, self.h2h):
hidden = F.tanh(i2h(inp) + h2h(hidden))
return hidden
net = testNet(128, 256, 3)
print(net)
net.cuda()
inp = Variable(torch.randn(3, 4, 128)).cuda()
init = Variable(torch.randn(4, 256)).cuda()
out = net(inp, init)
It work similar to python list like @fmassaâs idea, but the module itself is kept in the callerâs container like @apaszkeâs idea.
By the way, Is it possible to include this in the main Pytorch package?
@apaszke, I cannot find the documentation you are talking about.
Could you share a link, perhaps? Furthermore, shall I install from source to get this feature or Conda will suffice?
What do you want to do exactly?
A Variable is a variable, and doesnât imply any computation, so making a Module from just a Variable doesnât make sense to me. A Module carries information from how its different Variables are operated together to produce an output.
A linear âmoduleâ without a bias is consider a module when its simply just a Variable (matrix with some dimension). I donât see the difference of that and my suggestion. Something like:
I follow your method and it works well when I only use one GPU.
However, when I try to run it on Dataparallel, it gives me the error:
RuntimeError: Expected tensor for argument #1 âinputâ to have the same device as tensor for argument #2 âweightâ; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)