Using the SNLI Classifier on a different QA Dataset

Salman_Mohammed · May 21, 2017, 5:40pm

Hey guys,

I am trying to use the SNLI Classifier (https://github.com/pytorch/examples/tree/master/snli) on a different QA dataset, TrecQA, as a baseline model. I am having trouble importing the dataset.
The task is more or less the same (premise -> question, hypothesis -> answer) and there are 2 labels instead of 3.

This dataset has 4 files for each of train/dev/test set:
ids.txt, questions.txt, answers.txt, labels.txt.

How do I import the dataset in train, dev, set splits and build the vocabulary like they do in the SNLI example: https://github.com/pytorch/examples/blob/master/snli/train.py

Some help will be much appreciated. Thank you!

jekbradbury · May 22, 2017, 9:35am

SNLI is provided as a JSONL file, which means the torchtext JSON loader can be used more or less unmodified; it looks like the TREC dataset doesn’t quite match an existing torchtext loader, so you’d have to write a small loader that subclasses torchtext.data.Dataset. I’d look at the TranslationDataset code, which is quite similar to what you’d need to do.