Ricky_Han
(Ricky Han)
November 28, 2017, 7:55pm
1
-- linear layer with multple copies of parameters
-- which parameters to use is set by model_ids
function linear_multi(sz_in, sz_out, model_ids, input)
if g_opts.nmodels == 1 then
return nn.Linear(sz_in, sz_out)(input)
end
local weight_lut = nn.LookupTable(g_opts.nmodels, sz_in * sz_out)(model_ids)
weight_lut.data.module.updateGradInput = function(self, input) return input end
local bias_lut = nn.LookupTable(g_opts.nmodels, sz_out)(model_ids)
bias_lut.data.module.updateGradInput = function(self, input) return input end
local weight_view = nn.View(sz_out, sz_in):setNumInputDims(1)(weight_lut)
input = nn.View(-1, 1):setNumInputDims(1)(input)
local out = nn.MM(false, false)({weight_view, input})
out = nn.View(-1):setNumInputDims(2)(out)
out = nn.CAddTable()({out, bias_lut})
out.weight_lut = weight_lut
out.bias_lut = bias_lut
return out
end
The above code implements a simple matrix multiplication in Torch. I am trying to translate this into PyTorch. Since LookupTable is deprecated. How do you approach this?
Thanks in advance.
SimonW
(Simon Wang)
November 28, 2017, 8:02pm
2
I’m not very familiar with legacy code. How different is LookupTable from Embedding?
Ricky_Han
(Ricky Han)
November 28, 2017, 8:06pm
3
It seems Embedding is the way to go since they have the same input and output.
However, I am having trouble translating this line:
weight_lut.data.module.updateGradInput = function(self, input) return input end
to PyTorch.
Do I need to subclass Emdedding an supply a custom backward
function?
SimonW
(Simon Wang)
November 28, 2017, 8:10pm
4
What is .data.module.updateGradInput
supposed to do?
Ricky_Han
(Ricky Han)
November 28, 2017, 8:12pm
5
It seems in Torch the backward
simply calls updateGradInput
and then accGradParameters
according to https://bigaidream.gitbooks.io/subsets_ml_cookbook/content/dl/lua/lua_module.html
SimonW
(Simon Wang)
November 28, 2017, 8:46pm
6
So I guess it specifies custom “gradients” wrt input variables? Maybe this can work in PyTorch http://pytorch.org/docs/master/nn.html#torch.nn.Module.register_backward_hook ?
Ricky_Han
(Ricky Han)
November 28, 2017, 8:52pm
7
Thank you so much.
Do you know how to translate this piece of code to PyTorch?
-- Copyright (c) 2016-present, Facebook, Inc.
-- All rights reserved.
--
-- This source code is licensed under the BSD-style license found in the
-- LICENSE file in the root directory of this source tree. An additional grant
-- of patent rights can be found in the PATENTS file in the same directory.
-- linear layer with multple copies of parameters
-- which parameters to use is set by model_ids
function linear_multi(sz_in, sz_out, model_ids, input)
if g_opts.nmodels == 1 then
return nn.Linear(sz_in, sz_out)(input)
end
local weight_lut = nn.LookupTable(g_opts.nmodels, sz_in * sz_out)(model_ids)
weight_lut.data.module.updateGradInput = function(self, input) return input end
local bias_lut = nn.LookupTable(g_opts.nmodels, sz_out)(model_ids)
bias_lut.data.module.updateGradInput = function(self, input) return input end
local weight_view = nn.View(sz_out, sz_in):setNumInputDims(1)(weight_lut)
input = nn.View(-1, 1):setNumInputDims(1)(input)
local out = nn.MM(false, false)({weight_view, input})
This file has been truncated. show original
Here is what I have done so far:
from torch import nn
from torch.autograd import Variable
import torch
def linear_multi(nmodels, sz_in, sz_out, model_ids, input):
if nmodels == 1:
return nn.Linear(sz_in, sz_out)(input)
weight_lut = nn.Embedding(nmodels, sz_in * sz_out)(model_ids)
bias_lut = nn.Embedding(nmodels, sz_out)(model_ids)
weight_view = weight_lut.view(sz_out, sz_in)
a, b, c = input.size()
input = input.view(a*b, c)
out = torch.mm(weight_view, input)
a, b, c, d = out.size()
out = out.view(-1)
out = out.add(bias_lut)
out.weight_lut = weight_lut
out.bias_lut = bias_lut
class TestModule(nn.Module):
def __init__(self, nmodels):
# g_opts = {nmodels = 2}
# lm = linear_multi(10, 20, ids, input)
# w = lm.weight_lut.data.module
# b = lm.bias_lut.data.module
# w.weight[1]:copy(l1.weight:view(-1))
# w.weight[2]:copy(l2.weight:view(-1))
# b.weight[1]:copy(l1.bias)
# b.weight[2]:copy(l2.bias)
# model = nn.gModule({input, ids}, {lm})
self.nmodels = nmodels
def forward(self, ids, input):
lm = linear_multi(self.nmodels, 10, 20, ids, input)
if __name__ == "__main__":
l1 = nn.Linear(10, 20)
l2 = nn.Linear(10, 20)
x = Variable(torch.rand(3, 10))
y1 = l1.forward(x)
y2 = l2.forward(x)
model = TestModule(2)
model.forward(torch.LongTensor([1,2,1]), x)
SimonW
(Simon Wang)
November 28, 2017, 9:04pm
8
Sorry, I’m not quite familiar with lua torch. But what you have so far seems on the right track.
Ricky_Han
(Ricky Han)
November 28, 2017, 10:59pm
9
Thank you very much.
After 2 hours, this seems to be working:
from torch import nn
from torch.autograd import Variable
import torch
def linear_multi(nmodels, sz_in, sz_out, model_ids, input):
if nmodels == 1:
return nn.Linear(sz_in, sz_out)(input)
# XXX: potential bug - updateGradInput is overridden,
# possible use of `register_backward_hook`
weight_lut = nn.Embedding(nmodels, sz_in * sz_out)(model_ids) # 1x3x200
bias_lut = nn.Embedding(nmodels, sz_out)(model_ids) # 1x3x20
weight_view = weight_lut.view(-1, sz_in, sz_out) # 3 x 10 x 20
bias_view = bias_lut.view(-1, sz_out) # 3x20
a, b = input.size()
input = input.view(a, 1, b) # 3x1x10
out = torch.matmul(input, weight_view)# 3x1x20
a, b, c = out.size()
out = out.view(a, c) #3x20
out = out.add(bias_view) # 3x20
return out
class TestModule(nn.Module):
"""
g_opts = {nmodels = 2}
lm = linear_multi(10, 20, ids, input)
model = nn.gModule({input, ids}, {lm})
"""
def __init__(self, nmodels):
self.nmodels = nmodels
def forward(self, input, ids):
lm = linear_multi(self.nmodels, 10, 20, ids, input)
return lm
if __name__ == "__main__":
l1 = nn.Linear(10, 20)
l2 = nn.Linear(10, 20)
x = Variable(torch.rand(3, 10))
y1 = l1.forward(x)
y2 = l2.forward(x)
model = TestModule(3) # in lua: 0, 1, 2
y = model.forward(x, Variable(torch.LongTensor([[1,2,1]])))
assert y.size()[0] == 3
assert y.size()[1] == 20
assert y.dim() == 2
# Note: in the original test, the weight of l1, l2 is copied to the
# weight of linear_multi. Then test the matmul results are the same
1 Like
Hey Ricky
Did you succeed with the translation from Torch to PyTorch for CommNet?