I want to share a single matrix variable across input and output variable, ie per “Using the Output Embedding to Improve Language Models”, by Press and Wolf.
It seems like a clean-ish way to do this would be something like:
W = autograd.Variable(torch.rand(dim1, dim2), requires_grad=True)
input_embedding = tf.nn.Functional(W.transpose(0, 1))
output_embedding = tf.nn.Functional(W)
However, looks like embedding
is not in Functional
? So, I could create a standard non-fucntional Embedding
, then grab the weight variable from that, and re-use that, in a simple matrix multiplication, but it seems a little ‘hacky’ somehow? So wondering what are the cleanest option(s) for this?