Does torchtext support loading large files of 2GB?
I am unable to load 2.7GB json file using torch text.
Do you get a specific error message or does your code just hang?
Could you post a code snippet reproducing this error?
neither the code hangs nor i get any error message. i am trying in notebook and the cell is just executing from a long time (hours)
Code :
train_data , test_data = data.Tabulardataset.splits(path = path, train = “train.json” , test = “test.json” , format = “json”, fields = fields")
Is the train.json
somewhere available?
If not, could you just post a few sample rows so that I could create a dummy and try it on my machine?
i tired using a small dataset in the same format. i tried it using just two samples and it loaded in 1 minute. I understand i have a lot of samples in my original file but still its loading from 3-4 hours.
the sample is like a normal json :
{key 1: value, key2:value}
{}
{}…
I have the same problem here. I was trying to load a file that is 1.4 GB, minutes after the process got killed
>>> REF = data.Field(lower=True, tokenize=tokenize_char, init_token='<sos>',eos_token='<eos>')
>>> SRC = data.Field(lower=True, tokenize=tokenize_char)
>>> train = data.TabularDataset('./train.csv', format='csv', fields=[('src', SRC), ('ref', REF)])
Killed
and this works fine for a smaller dataset.
Anyone can tell me why this is happening? Thanks!