Hyperparameter fix_length for Field not working? (torch text)

Jeffrey_Tsang · May 20, 2019, 6:44pm

Hey, I am fairly new to deep learning and am now doing a NLP project for the task of text summarization.

For some reason when I am loading in the dataset with the Field where I’ve specified the fixed length, the original length seems unchanged for the data points.

SRC = Field(tokenize = tokenize_en, 
            init_token = '<sos>', 
            eos_token = '<eos>', 
            fix_length = 400,
            lower = True)

TRG = Field(tokenize = tokenize_en, 
            init_token = '<sos>', 
            eos_token = '<eos>', 
            fix_length = 100,
            lower = True)

fields = {'doc': ('doc', SRC), 'summaries': ('summaries', TRG)}

train_data,valid_data,test_data = TabularDataset.splits(path='./', train='train.json', validation='val.json', test='test.json', format='json', fields=fields)

If i were to run the code beneath for example, the length still exceeds 400.

print(len(vars(train_data.examples[40])['doc']))

Can someone point out to me what I am doing wrong or perhaps suggest another solution? Thanks for your time!

XinDongol · July 29, 2019, 5:48pm

I have the same problem