Train multiple languages

Hi, i want to do sentiment analysis with transforms on data where some text are written in Georgian and some are written in Georgian also but with English Alphabet. what should i do? what is the pipeline of such cases? should i convert everything in 1 Alphabet or is there any model that can learn such data?

Kind of up to you. I probably would train 2 separate models to keep the vocabularies small(er).

In any case, the model does not care. Once you have created you vocabulary or vocabularies and converted all texts to sequences to word indices, the model will accept it just fine.

Hi, Thank you for your reply. I kinda solved that one but another question is

What would u do if u had two different languages in data? 1 language is very dominant and the second language has very few samples and it needs to be just 1 model.