I’m trying to process product data for an e-commerce platform. The goal is to understand products’ size.
Just to show you some examples on how messy product dimension description is:
Overall Dimensions: 66 in W x 41 in D x 36 in H
Overall: 59 in W x 28.75 in D x 30.75 in H
92w 37d 32h",
86.6 in W x 33.9 in D x 24 in H
W: 95.75\" D: 36.5\" H: 28.75\"",
W: 96\" D: 39.25\" H: 32\"",
"118\"W x 35\"D x 33\"T.",
"28 L x 95 W x 41 H"
"95\" W x 26.5\" H x 34.75\" D"
"98\"W x 39\"D x 29\"H"
"28\" High x 80\" Wide x 32\" Deep"
Now assume that the product dimension description is short < 60 characters, I trained a two layer bidirectional LSTM, which can handle this task perfectly.
But the problem is, the above dimension description is usually embedded in a long context (as a part of the product description). How can I extract the useful information from the long context and understand it? My LSTM can only accept context size of 60 (the general idea is the same as https://towardsdatascience.com/addressnet-how-to-build-a-robust-street-address-parser-using-a-recurrent-neural-network-518d97b9aebd)
What language model is more suitable for this?