In torch text it is possible to create custom datasets by extend torchtext.data.Dataset
which requires calling the super constructor with a list of examples and the corresponding fields.
However for my data structure I need to apply numericalization and padding only to certain columns
fields=[("transcript", self.SOURCE),("label",None)] #For the "label" field apply nothing, just keep the original data structure
the field label is an array of strings
label=[[value1,value2],[value1, value3]]
For now I need to apply a nested field for this structure and at runtime I need to compile the indices back to words which makes it significally harder to process