I am wondering if there is anyway to use existing torchtext modules (Fields, Datasets, Iterators) to efficiently load data batches containing both sentences and graphs. Specifically, I am trying to achieve the following:
for batch in bucket_iterator: src_sents = batch.src_sents # This is the src sentence, a LongTensor src_nodes = batch.src_nodes # This is the src graph node labels (will convert to embedding), a LongTensor src_edges = batch.src_edges # This is the src graph edges, a SparseTensor (from torch_sparse) tgt_sents = batch.tgt_sents # This is the tgt sentence # do model training with above information
The main issue here is that the torchtext modules have many features that are necessary for preprocessing the text portion of my data. However, I need to also load the graph associated with each text. I have not yet found an efficient way to load the graph. Is there perhaps a way to build custom fields for graph nodes and edges?
Any help would be appreciated.
For clarification, please note that the graph is intended to work with modules in the pytorch_geometric library.