I’m trying to make a NLP NER LSTM model, and my data looks like this:
input:
input = [["i","am","here"], ["you","are","not","there"]]
and output:
output = [[1, 2,1], [1,2,1,2]]
As you can see, the length of input tensors are not the same. for batch_size =1
, but increasing batch size to any thing bigger that 1, creates this error:
Traceback (most recent call last):
File "stage_runner.py", line 28, in <module>
main()
File "stage_runner.py", line 24, in main
job_table[args.job](**params)
File "/Users/arefghodamai/Desktop/Projects/key_value_extraction/src/dvc_dags/train_model.py", line 22, in train_model
for sentence, tags in model_manager.data_loader:
File "/usr/local/Cellar/python@3.7/3.7.10_3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
data = self._next_data()
File "/usr/local/Cellar/python@3.7/3.7.10_3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 475, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/usr/local/Cellar/python@3.7/3.7.10_3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/usr/local/Cellar/python@3.7/3.7.10_3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 83, in default_collate
return [default_collate(samples) for samples in transposed]
File "/usr/local/Cellar/python@3.7/3.7.10_3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 83, in <listcomp>
return [default_collate(samples) for samples in transposed]
File "/usr/local/Cellar/python@3.7/3.7.10_3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
return torch.stack(batch, 0, out=out)
Is there any way to make this right, besides padding?