Hi,
My data is of shape batch_size * num_paths * num_edges * emb_dim
. Here:
-
batch_size
: refers to batch size (32) -
num_paths
: refers to the no of paths between two given terms in a dependency tree -
num_edges
: refers to the no of edges in a specific path -
emb_dim
: refers to the embedding dimension (300)
Both num_paths
and num_edges
are variable for every training sample. In other words, for every training sample, there may be variable number of paths, each with variable number of edges.
(Note that data
is an n-d list of lists in native Python, since two dimensions (num_paths
and num_edges
are variable)
I want to pass each path of each training instance as input to an LSTM, since a path is a sequence of edges and I want the resultant path representation after passing it through the LSTM. After obtaining the representation for each path in a training example, I want to take the sum of all these representations.
I know I can pack variable number of edges through pack_padded_sequence
. But what about variable number of paths? How do I account for that? Is there any way to do it in native Pytorch, without going for messy solutions like iterating through loops?
Any help would be greatly appreciated!
Thanks