LookupTable in PyTorch

-- linear layer with multple copies of parameters
-- which parameters to use is set by model_ids
function linear_multi(sz_in, sz_out, model_ids, input)
    if g_opts.nmodels == 1 then
        return nn.Linear(sz_in, sz_out)(input)
    end
    local weight_lut = nn.LookupTable(g_opts.nmodels, sz_in * sz_out)(model_ids)
    weight_lut.data.module.updateGradInput = function(self, input) return input end
    local bias_lut = nn.LookupTable(g_opts.nmodels, sz_out)(model_ids)
    bias_lut.data.module.updateGradInput = function(self, input) return input end
    local weight_view = nn.View(sz_out, sz_in):setNumInputDims(1)(weight_lut)
    input = nn.View(-1, 1):setNumInputDims(1)(input)
    local out = nn.MM(false, false)({weight_view, input})
    out = nn.View(-1):setNumInputDims(2)(out)
    out = nn.CAddTable()({out, bias_lut})
    out.weight_lut = weight_lut
    out.bias_lut = bias_lut
    return out
end

The above code implements a simple matrix multiplication in Torch. I am trying to translate this into PyTorch. Since LookupTable is deprecated. How do you approach this?

Thanks in advance.

I’m not very familiar with legacy code. How different is LookupTable from Embedding?

It seems Embedding is the way to go since they have the same input and output.

However, I am having trouble translating this line:

weight_lut.data.module.updateGradInput = function(self, input) return input end

to PyTorch.

Do I need to subclass Emdedding an supply a custom backward function?

What is .data.module.updateGradInput supposed to do?

It seems in Torch the backward simply calls updateGradInput and then accGradParameters according to https://bigaidream.gitbooks.io/subsets_ml_cookbook/content/dl/lua/lua_module.html

So I guess it specifies custom “gradients” wrt input variables? Maybe this can work in PyTorch http://pytorch.org/docs/master/nn.html#torch.nn.Module.register_backward_hook?

Thank you so much.

Do you know how to translate this piece of code to PyTorch?

Here is what I have done so far:



from torch import nn
from torch.autograd import Variable
import torch

def linear_multi(nmodels, sz_in, sz_out, model_ids, input):
    if nmodels == 1:
        return nn.Linear(sz_in, sz_out)(input)

    weight_lut = nn.Embedding(nmodels, sz_in * sz_out)(model_ids)
    bias_lut = nn.Embedding(nmodels, sz_out)(model_ids)
    weight_view = weight_lut.view(sz_out, sz_in)

    a, b, c = input.size()
    input = input.view(a*b, c)

    out = torch.mm(weight_view, input)

    a, b, c, d = out.size()
    out = out.view(-1)

    out = out.add(bias_lut)
    out.weight_lut = weight_lut
    out.bias_lut = bias_lut



class TestModule(nn.Module):
    def __init__(self, nmodels):
        # g_opts = {nmodels = 2}
        # lm = linear_multi(10, 20, ids, input)
        # w = lm.weight_lut.data.module
        # b = lm.bias_lut.data.module
        # w.weight[1]:copy(l1.weight:view(-1))
        # w.weight[2]:copy(l2.weight:view(-1))
        # b.weight[1]:copy(l1.bias)
        # b.weight[2]:copy(l2.bias)
        # model = nn.gModule({input, ids}, {lm})
        self.nmodels = nmodels

    def forward(self, ids, input):
        lm = linear_multi(self.nmodels, 10, 20, ids, input)

if __name__ == "__main__":
    l1 = nn.Linear(10, 20)
    l2 = nn.Linear(10, 20)
    x = Variable(torch.rand(3, 10))
    y1 = l1.forward(x)
    y2 = l2.forward(x)

    model = TestModule(2)
    model.forward(torch.LongTensor([1,2,1]), x)

Sorry, I’m not quite familiar with lua torch. But what you have so far seems on the right track.

Thank you very much.

After 2 hours, this seems to be working:


from torch import nn
from torch.autograd import Variable
import torch

def linear_multi(nmodels, sz_in, sz_out, model_ids, input):
    if nmodels == 1:
        return nn.Linear(sz_in, sz_out)(input)

    # XXX: potential bug - updateGradInput is overridden,
    # possible use of `register_backward_hook`

    weight_lut = nn.Embedding(nmodels, sz_in * sz_out)(model_ids) # 1x3x200
    bias_lut = nn.Embedding(nmodels, sz_out)(model_ids) # 1x3x20

    weight_view = weight_lut.view(-1, sz_in, sz_out) # 3 x 10 x 20
    bias_view = bias_lut.view(-1, sz_out) # 3x20

    a, b = input.size()
    input = input.view(a, 1, b) # 3x1x10

    out = torch.matmul(input, weight_view)# 3x1x20

    a, b, c = out.size()
    out = out.view(a, c) #3x20
    out = out.add(bias_view) # 3x20
    return out



class TestModule(nn.Module):
    """
        g_opts = {nmodels = 2}
        lm = linear_multi(10, 20, ids, input)
        model = nn.gModule({input, ids}, {lm})
    """
    def __init__(self, nmodels):
        self.nmodels = nmodels

    def forward(self, input, ids):
        lm = linear_multi(self.nmodels, 10, 20, ids, input)
        return lm

if __name__ == "__main__":
    l1 = nn.Linear(10, 20)
    l2 = nn.Linear(10, 20)
    x = Variable(torch.rand(3, 10))
    y1 = l1.forward(x)
    y2 = l2.forward(x)

    model = TestModule(3) # in lua: 0, 1, 2
    y = model.forward(x, Variable(torch.LongTensor([[1,2,1]])))
    assert y.size()[0] == 3
    assert y.size()[1] == 20
    assert y.dim() == 2
    
    # Note: in the original test, the weight of l1, l2 is copied to the
    # weight of linear_multi. Then test the matmul results are the same
1 Like

Hey Ricky
Did you succeed with the translation from Torch to PyTorch for CommNet?